Qi comments

Results 7 comments of

Qi

作者大牛,能不能列个关于主干模型和视觉模型的按照针对本项目的性能和价格的推荐列表? 默认的3个外国主干模型太受限了

> 目前我是用户的qwen-vl-max 不同日期版本分别作为主干模型和感知模型, 感知模型基本完美解决问题, 主干模型10次中会有3-4次坐标幻觉,还有1次逻辑推理错误. 你好，请问您是在模拟器中实现的，还是连接了物理手机？

用评测代码测试 Qwen2.5-Math-1.5B 结果和 report 的结果出入比较大

> > Hi, have you solved this problem? I also meet with similar problem > > I fixed this by using the https://github.com/ZubinGou/math-evaluation-harness, which is one of the foundations of...

用评测代码测试 Qwen2.5-Math-1.5B 结果和 report 的结果出入比较大

> > > > Hi, have you solved this problem? I also meet with similar problem > > > > > > > > > I fixed this by using...

分数计算逻辑似乎有问题导致n_sampling没生效？

Sorry to disturb you. Did you reproduce the results of Qwen2.5 math **base models** provided by the paper?

TIR实验结果异常

> The same problem here. For 7B-instruct, I got 77% on GSM8K with TIR and 95.6% with CoT. Sorry to disturb you. Did you reproduce the results of Qwen2.5 math...

exceed the model's predefined maximum length (4096)

Sorry to disturb you. Did you reproduce the results of Qwen2.5 math base models provided by the paper? I only achieved ~70% acc on Gsm8K dataset, which is largely inconsistent...

Qwen2.5math base模型测评的工具啥时候上线

请问您找到了对base model的evaluation config 吗