Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0
Describe the bug
Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0.
To Reproduce
docker image has been upgraded to 0.12.1, which is much slower when running Qwen1.5-14B-Chat-GPTQ-Int4 compared to 0.11.0.
Expected behavior
The number of tockens per second after the upgrade is the same as that before the upgrade.
Additional context
Our startup parameter configuration:
用vllma就快了,transfomer很慢,不知道为什么;旧版不知道用哪个,没 这个选项
显存有限,vllm用不起,应该说这是一个特性的变化,没有什么特别的改动么?没有改动就变慢了,那就奇怪了,有改动说一下改动在哪,我尝试单独打个版本改回来都成。
显存有限,vllm用不起,应该说这是一个特性的变化,没有什么特别的改动么?没有改动就变慢了,那就奇怪了,有改动说一下改动在哪,我尝试单独打个版本改回来都成。
我都不知道原来那个什么版本,因为拉的镜像是lastest,原来旧版很快的,就是升级后分开了vllma和transfomer,vllma占用显存多了,但是速度快;transfomer就速度很慢,虽然占用显存和原来旧版一样。
This issue is stale because it has been open for 7 days with no activity.
@worm128 transfomer速度很慢的问题解决了吗,我这也遇到了。 之前用vllm一直挺快,但是现在显卡降级了,计算能力低于7.5用不了vllm了,就改成了Transformer,慢的像牛,问个你好也得半分钟才有响应。