Yang Hao

Results 6 comments of Yang Hao

> 量化只针对llm,所以测速最好也是针对llm来进行。https://github.com/InternLM/lmdeploy/tree/main/benchmark 对于vision module没有优化的话,llm优化了最终性能也会提升吧?

> We didn't benchmark the VLM models but the LLM models. AWQ outperforms Half when batch_size < 256. The smaller the batch size, the faster AWQ is. The test script...

我这边显卡是A800-80G,我发现跑完fp16版本模型后,再次跑awq w4a16会特别慢,如果直接跑awq w4a16的速度是比fp16快一倍的。Token输出awq略少一些,我测试的TPS如下(token per second) ``` fp16: 73.60 awq-w4a16: 127.39 ``` 但是现在不清楚为什么先跑fp16再跑awq会变慢(我是写在两个方法里了,调用完fp16才会调用awq的模型,所以按理说应该没有干扰)

> @irexyc 把transformers的版本降到4.40.0可用了 这个好使,我的Transformer-4.42也报错,降到4.40.0可以了👍🏻

> Same issue, how do you solve it? Through logging, it was found that the system's librt.so, libm.so, etc. could not be found. However, through a find search, it was...