opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Feature] add acc_norm evaluation

Open kirliavc opened this issue 2 years ago • 0 comments

Describe the feature

lm-evaluation-harness supports acc_norm evaluation, which is used in huggingface leaderboard

ARC: 25-shot, arc-challenge (acc_norm)
HellaSwag: 10-shot, hellaswag (acc_norm)

acc_norm is calculated by the result (answer logits sum) divided by answer length

acc_norm = 1.0 if np.argmax(results / completion_len) == gold else 0.0

In ARC and hellaswag datasets, different answers have different lengths, so longer answers are likely to have larger logits sum, so it should be normalized with answer length to give a accurate prediction.

Will you implement it?

  • [ ] I would like to implement this feature and create a PR!

kirliavc avatar Oct 12 '23 02:10 kirliavc