Qwen1.5-7B-base模型推理速度比Qwen1.5-7B-chat模型速度慢很多

Open waltonfuture opened this issue 2 years ago • 3 comments

如题。我使用的huggingface格式的推理。请问是什么原因呢？

Mar 08 '24 03:03 waltonfuture

我也是，我感觉不是7b-base慢了，Qwen1.5-7B-chat变快了

Mar 10 '24 04:03 endNone

求教哪里有qwen1.5-base的推理脚本

Mar 11 '24 08:03 xyw7

我是用fastchat的

Mar 13 '24 00:03 endNone

They are of same code and same model arch. Check the generation length of them. For chat models, it will automatically stop with the generation of im_end token, so probably its generation length is much shorter

Apr 21 '24 09:04 JustinLin610