metaseq
metaseq copied to clipboard
OPT Results not matching for HellaSWAG dataset
Hi,
I tried reproducing the OPT results for various datasets using the LM-eval-harness framework.
I observe that the OPT Accuracy scores do not match the ones reported in the Figure 6 of the OPT paper. However, the Accuracy-norm score seem to be matching for this task.
For the rest of the tasks, regular accuracy scores match the ones presented in the plots in Figure 6.
Here is the table for HellaSWAG:
@stephenroller
So we used an internal evaluation suite, not lm-eval-harness. @tbmihailov is currently looking into resolving any differences (he was the author of the internal eval suite)
We found an off by one and have a PR in #224 to help fix the API. We'll be evaluating results on lm harness soon.
That said, LM harnesses's prompts are not as well made as @tbmihailov's internal eval suite.
Thanks @stephenroller for your clarification!
By the way, can you point me to a readme/code segment in the metaseq repository that I can use to perform inference with OPT 66B?
Hi @stephenroller, do you have any tips for getting around this problem to do few-shot classification? I'm using the Huggingface API and running into the first positive logit problem.
I don't have any recommendations for the Hugging Face API