YizhaoGao

Results 1 issues of YizhaoGao

Thanks for the great work in this benchmark. However, I found some of the settings to run native CoT/Reasoning model not so correct. 1. Models without Reasoning ability run twice...