Qwen2.5-Math icon indicating copy to clipboard operation
Qwen2.5-Math copied to clipboard

用评测代码测试 Qwen2.5-Math-1.5B 结果和 report 的结果出入比较大

Open pipixiaqishi1 opened this issue 1 year ago • 6 comments

您好,请问 base model 的评测是有专门的 prompt 吗?直接用对 instruct 模型的评测代码测试Qwen2.5-Math-1.5B,结果与 report 结果差距有点大。

pipixiaqishi1 avatar Dec 10 '24 15:12 pipixiaqishi1

Hi, have you solved this problem? I also meet with similar problem

ypwang61 avatar Apr 15 '25 20:04 ypwang61

Hi, have you solved this problem? I also meet with similar problem

I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.

pipixiaqishi1 avatar Apr 16 '25 03:04 pipixiaqishi1

Hi, have you solved this problem? I also meet with similar problem

I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.

Hi, sorry to bother you. Would you mind sharing an evaluation configuration for the Qwen2.5-Math base models, such as top_k and temperature?

1998v7 avatar Jun 18 '25 08:06 1998v7

Hi, have you solved this problem? I also meet with similar problem

I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.

Hi, sorry to bother you. Would you mind sharing an evaluation configuration for the Qwen2.5-Math base models, such as top_k and temperature?

Hi, sorry, I am not really deep into it. Maybe you can check the report of Qwen2.5-Math to seek the configuration if they provided it. I evaluate models with the default config of https://github.com/ZubinGou/math-evaluation-harness.

pipixiaqishi1 avatar Jun 18 '25 08:06 pipixiaqishi1

Hi, have you solved this problem? I also meet with similar problem

I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.

Hi, sorry to bother you. Would you mind sharing an evaluation configuration for the Qwen2.5-Math base models, such as top_k and temperature?

Hi, sorry, I am not really deep into it. Maybe you can check the report of Qwen2.5-Math to seek the configuration if they provided it. I evaluate models with the default config of https://github.com/ZubinGou/math-evaluation-harness.

For qwen2.5-math base model, does the results generated by this repo match the score provide in the paper?

1998v7 avatar Jun 18 '25 08:06 1998v7

with reasonable differences

pipixiaqishi1 avatar Jun 18 '25 09:06 pipixiaqishi1