Opher Lieber
Results
1
issues of
Opher Lieber
Hi, I'm trying to reproduce the results from the OpenLLM leaderboard, and all benchmarks seem ok (within ~0.2%) except for winogrande which is consistently lower when running through lighteval. Examples...
bug