YizhaoGao
Results
1
issues of
YizhaoGao
Thanks for the great work in this benchmark. However, I found some of the settings to run native CoT/Reasoning model not so correct. 1. Models without Reasoning ability run twice...