answer-equivalence-dataset icon indicating copy to clipboard operation
answer-equivalence-dataset copied to clipboard

This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and contains 9k human judgements of answer candidates generated by Al...

Answer Equivalence Dataset

This dataset is introduced and described in Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation.

Download the data

AE Split # AE Examples # Ratings
Train 9,090 9,090
Dev 2,734 4,446
Test 5,831 9,724
Total 17,655 23,260
Split by system # AE Examples # Ratings
BiDAF dev predictions 5622 7522
XLNet dev predictions 2448 7932
Luke dev predictions 2240 4590
Total 8,565 14,170

BERT Matching (BEM) model

The BEM model from the paper, finetuned on this dataset, is available on tfhub.

This colab demonstrates how to use it.

How to cite AE?

@article{bulian-etal-2022-tomayto,
  author    = {Jannis Bulian and
		Christian Buck  and
		Wojciech Gajewski and
		Benjamin B{\"o}rschinger and
		Tal Schuster},
  title     = {Tomayto, Tomahto. Beyond Token-level Answer Equivalence 
               for Question Answering Evaluation},
  journal   = {CoRR},
  volume    = {abs/2202.07654},
  year      = {2022},
  ee        = {http://arxiv.org/abs/2202.07654},
}

Disclaimer

This is not an official Google product.

Contact information

For help or issues, please submit a GitHub issue or contact the authors by email.