metaseq OPT Results not matching for HellaSWAG dataset

OPT Results not matching for HellaSWAG dataset

Open Hritikbansal opened this issue 2 years ago • 5 comments

Hi,

I tried reproducing the OPT results for various datasets using the LM-eval-harness framework.

I observe that the OPT Accuracy scores do not match the ones reported in the Figure 6 of the OPT paper. However, the Accuracy-norm score seem to be matching for this task.

For the rest of the tasks, regular accuracy scores match the ones presented in the plots in Figure 6.

Here is the table for HellaSWAG:

@stephenroller

Jun 30 '22 02:06 Hritikbansal

So we used an internal evaluation suite, not lm-eval-harness. @tbmihailov is currently looking into resolving any differences (he was the author of the internal eval suite)

Jun 30 '22 17:06 stephenroller

We found an off by one and have a PR in #224 to help fix the API. We'll be evaluating results on lm harness soon.

That said, LM harnesses's prompts are not as well made as @tbmihailov's internal eval suite.

Jul 15 '22 04:07 stephenroller

Thanks @stephenroller for your clarification!

By the way, can you point me to a readme/code segment in the metaseq repository that I can use to perform inference with OPT 66B?

Jul 23 '22 01:07 Hritikbansal

Hi @stephenroller, do you have any tips for getting around this problem to do few-shot classification? I'm using the Huggingface API and running into the first positive logit problem.

Oct 09 '22 03:10 taidnguyen

I don't have any recommendations for the Hugging Face API

Oct 14 '22 12:10 stephenroller

metaseq metaseq copied to clipboard

OPT Results not matching for HellaSWAG dataset

metaseq
metaseq copied to clipboard