neural-question-generation
neural-question-generation copied to clipboard
Model performance drops significantly on split-1 dataset.
Hi, I'm trying to reproduce the results of Zhao's paper with your code. On the Split-2 dataset, which is the same as your code, the results are almost the same as those in the paper. However, on the split-1 dataset, which divides the original SQuAD train set into the train set and test set in the proportion of 90% - 10%, I only get 14.0 of BLEU_4 score, which is 2.8 lower than that in the paper. Have you ever encountered this problem? What do you think might have caused a huge performance drop?
Sorry, I haven't experimented with the split-2 dataset.
Maybe how I choose the best checkpoint is different from the author's.