Rouge scores mismatch
The work you have put in is quite appealing.
We have used the model provided here under the section "Train with pointer generation + coverage loss enabled " to decode. The ROUGE scores we obtained slightly vary from that posted here.
Our ROUGE scores ROUGE-1: rouge_1_f_score: 0.3680 with confidence interval (0.3658, 0.3701) rouge_1_recall: 0.4234 with confidence interval (0.4208, 0.4261) rouge_1_precision: 0.3471 with confidence interval (0.3446, 0.3496)
ROUGE-2: rouge_2_f_score: 0.1485 with confidence interval (0.1464, 0.1507) rouge_2_recall: 0.1706 with confidence interval (0.1682, 0.1731) rouge_2_precision: 0.1407 with confidence interval (0.1385, 0.1429)
ROUGE-l: rouge_l_f_score: 0.3327 with confidence interval (0.3306, 0.3349) rouge_l_recall: 0.3827 with confidence interval (0.3802, 0.3853) rouge_l_precision: 0.3139 with confidence interval (0.3116, 0.3164)
To get the expected scores in the README what could be the config parameters?