Qwen2.5 Qwen1.5的推理速度非常慢，还伴有较重的幻觉

感觉还不如原来的11代稳定

Apr 10 '24 07:04 effectzhang

To facilitate productive feedback, kindly offer a comprehensive description of your encountered problems along with step-by-step instructions for reproduction. Failure to do so may result in the closure of this issue, as it would neither enable others to understand your concerns nor allow them to provide effective assistance.

We have provided the profiling results of Qwen1.5 at https://github.com/QwenLM/Qwen1.5/issues/202 which is slightly slower than Qwen1.0 if transformers is used and should have no substantial difference in speed with vLLM.

For halluciation, you also need to provide your examples, the model you were using, and the generation configuration.

Apr 12 '24 08:04 jklj077

To facilitate productive feedback, kindly offer a comprehensive description of your encountered problems along with step-by-step instructions for reproduction. Failure to do so may result in the closure of this issue, as it would neither enable others to understand your concerns nor allow them to provide effective assistance.

We have provided the profiling results of Qwen1.5 at #202 which is slightly slower than Qwen1.0 if transformers is used and should have no substantial difference in speed with vLLM.

For halluciation, you also need to provide your examples, the model you were using, and the generation configuration.

我也发现了，我用Qwen1.5系列各个尺寸的模型，用于做单选题任务，发现无论怎么设计Prompt，模型都无法避免会输出选项之外的答案；

模型的输出格式遵从能力很好，但是模型输出的内容经常有幻觉，不够遵从在Prompt给定的客观选项，尤其是做单选题任务时，选项数量大于10的时候，幻觉现象更加严重

Apr 16 '24 02:04 xuexidi

Hi all, we'd greatly appreciate your sharing of any bad cases.

Apr 16 '24 02:04 jklj077

Hi all, we'd greatly appreciate your sharing of any bad cases.

很抱歉因为信息安全原因无法直接提供我们的Prompt细节。我在用Qwen1.5系列模型完成一项App导航任务，大致Prompt及情况如下：

给定Qwen1.5模型以App Agent的角色，以及给定App的操作目标、当前页面的所有按钮的名称，让Qwen1.5在当前页面内选择一个可以达成App操作目标的按钮，Qwen1.5模型在做决策的时候输出的按钮名称经常不在给定的列表里，Qwen1虽然经常选择错误的按钮，但是没有像Qwen1.5系列模型"无中生有"现象这么严重

Apr 19 '24 01:04 xuexidi

您好，请问这个问题您解决了吗，我也遇到了类似的问题，我用的是qwen1.5-7b的chat模型，即使我用bf16，推理速度仍然非常慢，平均一句很短的小问题都要好几秒钟。

May 12 '24 08:05 jmycsu

您好，请问这个问题您解决了吗，我也遇到了类似的问题，我用的是qwen1.5-7b的chat模型，即使我用bf16，推理速度仍然非常慢，平均一句很短的小问题都要好几秒钟。

我也是，跟你一样的模型和配置。随便问个问题，回复都需要2分钟。搜谷歌百度，暂时没看到其他人反馈，在github上看到两三次。另外，model.generate的方式好复杂，每次调用都需要附带一大推代码。原来的model.chat简洁太多！

May 22 '24 08:05 imempty

推荐使用vllm部署调用吧，快很多

---- 回复的原邮件 ---- | 发件人 | @.> | | 发送日期 | 2024年05月22日 16:09 | | 收件人 | @.> | | 抄送人 | xuexidi @.>, Comment @.> | | 主题 | Re: [QwenLM/Qwen1.5] Qwen1.5的推理速度非常慢，还伴有较重的幻觉 (Issue #283) |

您好，请问这个问题您解决了吗，我也遇到了类似的问题，我用的是qwen1.5-7b的chat模型，即使我用bf16，推理速度仍然非常慢，平均一句很短的小问题都要好几秒钟。

我也是，跟你一样的模型和配置。随便问个问题，回复都需要2分钟。搜谷歌百度，暂时没看到其他人反馈，在github上看到两三次。另外，model.generate的方式好复杂，每次调用都需要附带一大推代码。原来的model.chat简洁太多！

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

May 22 '24 09:05 xuexidi

推荐使用vllm部署调用吧，快很多 … ---- 回复的原邮件 ---- | 发件人 | @.> | | 发送日期 | 2024年05月22日 16:09 | | 收件人 | @.> | | 抄送人 | xuexidi @.>, Comment @.> | | 主题 | Re: [QwenLM/Qwen1.5] Qwen1.5的推理速度非常慢，还伴有较重的幻觉 (Issue #283) | 您好，请问这个问题您解决了吗，我也遇到了类似的问题，我用的是qwen1.5-7b的chat模型，即使我用bf16，推理速度仍然非常慢，平均一句很短的小问题都要好几秒钟。我也是，跟你一样的模型和配置。随便问个问题，回复都需要2分钟。搜谷歌百度，暂时没看到其他人反馈，在github上看到两三次。另外，model.generate的方式好复杂，每次调用都需要附带一大推代码。原来的model.chat简洁太多！ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

推理搞那么复杂干啥？就不能本地吗？我试了TextStreamer 就非常好用，就是会重复，如果能加个惩罚机制那就完美解决了~！~

May 22 '24 10:05 gg22mm

TextStreamer

HuggingFace Transformers 的TextStreamer？推理速度能恢复正常？同样的硬件和问题，chatglm3只需1-2秒

May 22 '24 12:05 imempty

TextStreamer

HuggingFace Transformers 的TextStreamer？推理速度能恢复正常？同样的硬件和问题，chatglm3只需1-2秒

刚试了，还是很慢，只是变成流式输出了，每次回复会重复两遍，这新问题咋整。没解决问题啊

May 22 '24 12:05 imempty

vllm也可以import到代码中基于代码推理，不需要部署成在线服务

---- 回复的原邮件 ---- | 发件人 | @.> | | 发送日期 | 2024年05月22日 20:42 | | 收件人 | @.> | | 抄送人 | xuexidi @.>, Comment @.> | | 主题 | Re: [QwenLM/Qwen1.5] Qwen1.5的推理速度非常慢，还伴有较重的幻觉 (Issue #283) |

TextStreamer

HuggingFace Transformers 的TextStreamer？推理速度能恢复正常？同样的硬件和问题，chatglm3只需1-2秒

刚试了，还是很慢，只是变成流式输出了，每次回复会重复两遍，这新问题咋整。没解决问题啊

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

May 22 '24 13:05 xuexidi

@jmycsu @imempty @gg22mm Please open new issues to describe your problems. They don't seem to be related to the ones being tracked here. I'm marking your comments as off topic.

May 23 '24 02:05 jklj077

"do_sample":True, # 启用随机抽样(实验只有这个有效)

还有慢的原因可能是： max_new_tokens 设置过大好象

May 23 '24 05:05 gg22mm

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

Jun 23 '24 08:06 github-actions[bot]

对qwen1.5 4b进行autoawqint 4 量化，使用vllm发现比fp16推理速度还慢，什么原因？

Aug 02 '24 12:08 lindongs

Qwen2.5 Qwen2.5 copied to clipboard

Qwen1.5的推理速度非常慢，还伴有较重的幻觉

Qwen2.5
Qwen2.5 copied to clipboard