Model performance drops significantly on split-1 dataset.

Open xchencode opened this issue 5 years ago • 1 comments

Hi, I'm trying to reproduce the results of Zhao's paper with your code. On the Split-2 dataset, which is the same as your code, the results are almost the same as those in the paper. However, on the split-1 dataset, which divides the original SQuAD train set into the train set and test set in the proportion of 90% - 10%, I only get 14.0 of BLEU_4 score, which is 2.8 lower than that in the paper. Have you ever encountered this problem? What do you think might have caused a huge performance drop?

Nov 29 '20 09:11 xchencode

Sorry, I haven't experimented with the split-2 dataset.

Maybe how I choose the best checkpoint is different from the author's.

Jan 12 '21 10:01 seanie12