Mismatch in train and development data for XOR-Full
The xor_train_full.json file has 'answers' in English, while the xor_dev_full_v1_1.jsonl file has answers in the target language. I am wondering how do we get answers in the target language for the train split?
The issue persists for the training data. For the Gold Paragraph data also, the answers are not available in target language.
Sorry I have overlooked this issue! As mentioned in the paper Section 2.1.4, the answer translations have been conducted for the evaluation set only so that the xor_train_full.json includes the data whose answers are English.
Note that because of the cost of answer translations, we conduct this answer translation process for evaluation sets only.
I'm closing this issue now but please feel free to re-open if you have any followup questions!