Large-scale-Cloze-Test-Dataset-Created-by-Teachers
Large-scale-Cloze-Test-Dataset-Created-by-Teachers copied to clipboard
Is the performance in the paper based on all the data or the 3000 sampled questions?
Is the performance in the paper based on all the data or the 3000 sampled questions? If it is the latter, how can I get the same 3000 sampled data as you do for a fair comparison? Thank you!
Hi, only the human performance is based on the 3000 sampled questions. All the models‘ performance is measured on the whole test set.