用评测代码测试 Qwen2.5-Math-1.5B 结果和 report 的结果出入比较大
您好,请问 base model 的评测是有专门的 prompt 吗?直接用对 instruct 模型的评测代码测试Qwen2.5-Math-1.5B,结果与 report 结果差距有点大。
Hi, have you solved this problem? I also meet with similar problem
Hi, have you solved this problem? I also meet with similar problem
I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.
Hi, have you solved this problem? I also meet with similar problem
I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.
Hi, sorry to bother you. Would you mind sharing an evaluation configuration for the Qwen2.5-Math base models, such as top_k and temperature?
Hi, have you solved this problem? I also meet with similar problem
I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.
Hi, sorry to bother you. Would you mind sharing an evaluation configuration for the Qwen2.5-Math base models, such as top_k and temperature?
Hi, sorry, I am not really deep into it. Maybe you can check the report of Qwen2.5-Math to seek the configuration if they provided it. I evaluate models with the default config of https://github.com/ZubinGou/math-evaluation-harness.
Hi, have you solved this problem? I also meet with similar problem
I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of this repo.
Hi, sorry to bother you. Would you mind sharing an evaluation configuration for the Qwen2.5-Math base models, such as top_k and temperature?
Hi, sorry, I am not really deep into it. Maybe you can check the report of Qwen2.5-Math to seek the configuration if they provided it. I evaluate models with the default config of https://github.com/ZubinGou/math-evaluation-harness.
For qwen2.5-math base model, does the results generated by this repo match the score provide in the paper?
with reasonable differences