Qwen2.5 icon indicating copy to clipboard operation
Qwen2.5 copied to clipboard

速度比qwen变慢了很多

Open 1920853199 opened this issue 1 year ago • 4 comments

Qwen1.5-72b-chat 推理速度 比 Qwen-72b-chat 慢很多,大家有这种情况吗

1920853199 avatar Feb 21 '24 00:02 1920853199

是不是用的float32?

hljjjmssyh avatar Feb 21 '24 02:02 hljjjmssyh

torch_dtype='auto'. Check the latest readme

JustinLin610 avatar Feb 25 '24 09:02 JustinLin610

same to me, still no output.

bingwork avatar Feb 27 '24 04:02 bingwork

这个问题最终大家解决了吗?我们也遇到了同样的问题

suchstar avatar Apr 10 '24 01:04 suchstar

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16, use_cache=True) 显示设置为torch_dtype=torch.bfloat16推理速度会快40%左右,显存也会降很多。 不过推理速度依旧很慢

JeckerWen avatar Apr 28 '24 13:04 JeckerWen