Kevin Huang issues

Repositories
Issues
Comments

Results 1 issues of


                                            Kevin Huang

【Help】使用lm-evaluation-harness评估，ChatGLM2-6B在CEval上准确率很低？

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 如题，使用lm-evaluation-harness评估，ChatGLM2-6B在CEval上准确率很低？只有20%多，和官宣的差别太大。不知道是什原因？我是使用 https://github.com/EleutherAI/lm-evaluation-harness 跑的，由于Ceval的test data没有公布答案，所以我使用的 1346条val data，zero-shot跑出来的acc是0.2422，five-shot跑出来的acc是0.2835。为排除ceval val data数据量太少可能导致的acc低的问题，我又同样跑了CMMLU，CMMLU test...