Qwen2.5-Math [Bug]: 您好，使用vllm部署的Qwen2.5-Math-72B-Instruct和Qwen2.5-Math-7B-Instruct数学模型，为什么评测时模型怎么回答了很多其他乱七八糟的内容，直到达到限制token数才停止？

Model Series

Qwen2.5

What are the models used?

Qwen2.5-Math-72B-Instruct、Qwen2.5-Math-7B-Instruct

What is the scenario where the problem happened?

vllm

Is this a known issue?

[X] I have followed the GitHub README.
[X] I have checked the Qwen documentation and cannot find an answer there.
[X] I have checked the documentation of the related framework and cannot find useful information.
[X] I have searched the issues and there is not a similar one.

Information about environment

您好，使用vllm部署的Qwen2.5-Math-72B-Instruct和Qwen2.5-Math-7B-Instruct数学模型，为什么评测时模型怎么回答了很多其他乱七八糟的内容，直到达到限制token数才停止？（其中评测数据集有Math）

Log output

您好，使用vllm部署的Qwen2.5-Math-72B-Instruct和Qwen2.5-Math-7B-Instruct数学模型，为什么评测时模型怎么回答了很多其他乱七八糟的内容，直到达到限制token数才停止？（其中评测数据集有Math）

Description

您好，使用vllm部署的Qwen2.5-Math-72B-Instruct和Qwen2.5-Math-7B-Instruct数学模型，为什么评测时模型怎么回答了很多其他乱七八糟的内容，直到达到限制token数才停止？（其中评测数据集有Math）

Nov 07 '24 08:11 13416157913

请问这个有发现这个问题是因为什么吗

Nov 26 '24 09:11 jnanliu

是的，我也有这个问题，7B的模型有时候会在输出回答后再输出一些无关紧要的内容，有时候也只是无限重复原问题，直到token数量上限。 Yes, I have this question either. The 7B model sometimes appends some unrelated content after giving an answer, or it will endlessly repeat the original question until it hits the token limit.

Dec 05 '24 03:12 William-WSJ

是的，我也有这个问题，7B的模型有时候会在输出回答后再输出一些无关紧要的内容，有时候也只是无限重复原问题，直到token数量上限。 Yes, I have this question either. The 7B model sometimes appends some unrelated content after giving an answer, or it will endlessly repeat the original question until it hits the token limit.

将温度系数从1.0更改为0.7会好很多。 In my settings, changing the temperature coefficient from 1.0 to 0.7 is helpful.

Dec 05 '24 08:12 jnanliu

是的，我也有这个问题，7B的模型有时候会在输出回答后再输出一些无关紧要的内容，有时候也只是无限重复原问题，直到token数量上限。 Yes, I have this question either. The 7B model sometimes appends some unrelated content after giving an answer, or it will endlessly repeat the original question until it hits the token limit.

将温度系数从1.0更改为0.7会好很多。 In my settings, changing the temperature coefficient from 1.0 to 0.7 is helpful.

我这边温度设置很低的0.2，

Dec 05 '24 11:12 13416157913

是的，我也有这个问题，7B的模型有时候会在输出回答后再输出一些无关紧要的内容，有时候也只是无限重复原问题，直到token数量上限。 Yes, I have this question either. The 7B model sometimes appends some unrelated content after giving an answer, or it will endlessly repeat the original question until it hits the token limit.

将温度系数从1.0更改为0.7会好很多。 In my settings, changing the temperature coefficient from 1.0 to 0.7 is helpful.

我这边温度设置很低的0.2，

是这样的，我使用英文输入时，7B模型可以正确解答问题，但是会在解答之后又会出现以Human:打头的无关题目的输出，直到达到max token。如果是中文题目，大概率会一直重复我的问题，输出结果和配置如下图：输出结果英文：

输出结果中文：配置文件：

这里已经将qwen2.5-math-7B模型部署了，使用端口访问的

Dec 06 '24 04:12 William-WSJ

math_eval.py里有个stop_words的选项，可以加

Dec 26 '24 07:12 pengwenzhi

math_eval.py里有个stop_words的选项，可以加

通过stop_words来解决，感觉没真正从根本上解决问题，模型的回答本质上还是很长，只是通过stop_words截断而已；这种解决方法，从长远来看，不够合理，因为不知道模型回答中，会不会出现不在stop_words中停止符号。

Dec 26 '24 11:12 13416157913

是的，我也有这个问题，7B的模型有时候会在输出回答后再输出一些无关紧要的内容，有时候也只是无限重复原问题，直到token数量上限。 Yes, I have this question either. The 7B model sometimes appends some unrelated content after giving an answer, or it will endlessly repeat the original question until it hits the token limit.

将温度系数从1.0更改为0.7会好很多。 In my settings, changing the temperature coefficient from 1.0 to 0.7 is helpful.

我这边温度设置很低的0.2，

是这样的，我使用英文输入时，7B模型可以正确解答问题，但是会在解答之后又会出现以Human:打头的无关题目的输出，直到达到max token。如果是中文题目，大概率会一直重复我的问题，输出结果和配置如下图：输出结果英文：

输出结果中文：配置文件：

这里已经将qwen2.5-math-7B模型部署了，使用端口访问的

请问你解决没

Dec 31 '24 09:12 qwerty3564

any suggestion?

Mar 08 '25 09:03 0205090923