XORQA icon indicating copy to clipboard operation
XORQA copied to clipboard

Mismatch in train and development data for XOR-Full

Open mihirkale815 opened this issue 3 years ago • 2 comments

The xor_train_full.json file has 'answers' in English, while the xor_dev_full_v1_1.jsonl file has answers in the target language. I am wondering how do we get answers in the target language for the train split?

mihirkale815 avatar Apr 18 '22 19:04 mihirkale815

The issue persists for the training data. For the Gold Paragraph data also, the answers are not available in target language.

pagrawal-ml avatar May 03 '22 13:05 pagrawal-ml

Sorry I have overlooked this issue! As mentioned in the paper Section 2.1.4, the answer translations have been conducted for the evaluation set only so that the xor_train_full.json includes the data whose answers are English.

Note that because of the cost of answer translations, we conduct this answer translation process for evaluation sets only.

AkariAsai avatar Oct 11 '22 04:10 AkariAsai

I'm closing this issue now but please feel free to re-open if you have any followup questions!

AkariAsai avatar Nov 08 '22 08:11 AkariAsai