Mismatch in train and development data for XOR-Full

Open mihirkale815 opened this issue 3 years ago • 2 comments

The xor_train_full.json file has 'answers' in English, while the xor_dev_full_v1_1.jsonl file has answers in the target language. I am wondering how do we get answers in the target language for the train split?

Apr 18 '22 19:04 mihirkale815

The issue persists for the training data. For the Gold Paragraph data also, the answers are not available in target language.

May 03 '22 13:05 pagrawal-ml

Sorry I have overlooked this issue! As mentioned in the paper Section 2.1.4, the answer translations have been conducted for the evaluation set only so that the xor_train_full.json includes the data whose answers are English.

Note that because of the cost of answer translations, we conduct this answer translation process for evaluation sets only.

Oct 11 '22 04:10 AkariAsai

I'm closing this issue now but please feel free to re-open if you have any followup questions!

Nov 08 '22 08:11 AkariAsai