Yang Hao comments

Results 6 comments of


                                            Yang Hao

When will the YOLOv9 algorithm go live in the MMYOLO project?

+1 mark

[Bug] internvl2-2b使用awq量化后，推理速度基本上没有提升，精度还掉点

> 量化只针对llm，所以测速最好也是针对llm来进行。https://github.com/InternLM/lmdeploy/tree/main/benchmark 对于vision module没有优化的话，llm优化了最终性能也会提升吧？

[Bug] Mini-InternVL-Chat-2B-V1-5 AWQ量化后推理速度比量化前慢

> We didn't benchmark the VLM models but the LLM models. AWQ outperforms Half when batch_size < 256. The smaller the batch size, the faster AWQ is. The test script...

[Bug] 为什么minicpm-v2_5 使用awq int4量化后速度比fp16慢三倍

我这边显卡是A800-80G，我发现跑完fp16版本模型后，再次跑awq w4a16会特别慢，如果直接跑awq w4a16的速度是比fp16快一倍的。Token输出awq略少一些，我测试的TPS如下(token per second) ``` fp16: 73.60 awq-w4a16: 127.39 ``` 但是现在不清楚为什么先跑fp16再跑awq会变慢（我是写在两个方法里了，调用完fp16才会调用awq的模型，所以按理说应该没有干扰）

minicpm-v采用W4A16量化，推理速度没什么变化

> @irexyc 把transformers的版本降到4.40.0可用了这个好使，我的Transformer-4.42也报错，降到4.40.0可以了👍🏻

stack during import tensorrt_llm

> Same issue, how do you solve it? Through logging, it was found that the system's librt.so, libm.so, etc. could not be found. However, through a find search, it was...