pythia Questions regarding the WSC evaluation results

Questions regarding the WSC evaluation results

Open mutiann opened this issue 5 months ago • 0 comments

Hi,

I'm recently trying to run lm-eval on Pythia models using the benchmarks listed in the paper. All the benchmarks show similar results to those reported in the paper, except WSC. In the paper the Pythia models report a WSC score of 0.3~0.5, while the models can easily get 0.6~0.8 accuracy on the WSC273 task from lm-eval. May I confirm what is the WSC task reported in the paper and how is it evaluated?

Thanks!

Sep 20 '24 14:09 mutiann

pythia pythia copied to clipboard

Questions regarding the WSC evaluation results

pythia
pythia copied to clipboard