answer-equivalence-dataset
answer-equivalence-dataset copied to clipboard
This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and contains 9k human judgements of answer candidates generated by Al...
Answer Equivalence Dataset
This dataset is introduced and described in Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation.
Download the data
| AE Split | # AE Examples | # Ratings |
|---|---|---|
| Train | 9,090 | 9,090 |
| Dev | 2,734 | 4,446 |
| Test | 5,831 | 9,724 |
| Total | 17,655 | 23,260 |
| Split by system | # AE Examples | # Ratings |
|---|---|---|
| BiDAF dev predictions | 5622 | 7522 |
| XLNet dev predictions | 2448 | 7932 |
| Luke dev predictions | 2240 | 4590 |
| Total | 8,565 | 14,170 |
BERT Matching (BEM) model
The BEM model from the paper, finetuned on this dataset, is available on tfhub.
This colab demonstrates how to use it.
How to cite AE?
@article{bulian-etal-2022-tomayto,
author = {Jannis Bulian and
Christian Buck and
Wojciech Gajewski and
Benjamin B{\"o}rschinger and
Tal Schuster},
title = {Tomayto, Tomahto. Beyond Token-level Answer Equivalence
for Question Answering Evaluation},
journal = {CoRR},
volume = {abs/2202.07654},
year = {2022},
ee = {http://arxiv.org/abs/2202.07654},
}
Disclaimer
This is not an official Google product.
Contact information
For help or issues, please submit a GitHub issue or contact the authors by email.