evaluation
evaluation copied to clipboard

Published 20 hours ago •

bigscience-workshop

Reame
Issues

Add HANS dataset

Open aakanksha19 opened this issue 3 years ago • 0 comments

Evaluated on GPT2
Time taken: 3:40:59 on GTX 1080 Ti

Other comments:

Prompt template used is the same as XQUAD/PIAF, with minor addition of the question "is this true or false?" (to indicate entailment/non-entailment)
In addition to accuracy, other fine-grained evaluation metrics present in the HANS evaluation script (https://github.com/tommccoy1/hans/blob/master/evaluate_heur_output.py) are also added, but can be removed if deemed unnecessary.

Oct 02 '21 14:10 aakanksha19