Opher Lieber

Results 1 issues of Opher Lieber

Hi, I'm trying to reproduce the results from the OpenLLM leaderboard, and all benchmarks seem ok (within ~0.2%) except for winogrande which is consistently lower when running through lighteval. Examples...

bug