MultiTurnDialogZoo
MultiTurnDialogZoo copied to clipboard
Performance on DailyDialog dataset
Hi, I tried running the Seq2Seq and HRED models on dialydialog dataset. Here are the results I got:
Model Seq2Seq Result BLEU-1: 0.215 BLEU-2: 0.0986 BLEU-3: 0.057 BLEU-4: 0.0366 ROUGE: 0.0492 Distinct-1: 0.0268; Distinct-2: 0.131 Ref distinct-1: 0.0599; Ref distinct-2: 0.3644 BERTScore: 0.1414
Model HRED Result BLEU-1: 0.2121 BLEU-2: 0.0961 BLEU-3: 0.0542 BLEU-4: 0.0331 ROUGE: 0.0502 Distinct-1: 0.0208; Distinct-2: 0.0992 Ref distinct-1: 0.0588; Ref distinct-2: 0.3619 BERTScore: 0.1436
These results seem to be much lower than the ones reported in the dailydialog paper: https://www.aclweb.org/anthology/I17-1099.pdf Do you have any clues on why is that the case? Thanks!
Hi, thanks for your attention on this repo. Compared with the results in the original DailyDialog paper, the BLEU-1/2 score are lower but it can also be found that the BLEU-3/4 are much better. In my opinion, the BLEU-3/4 score are more suitable than BLEU-1/2, which indicates that the model can generate more fluently. So I think it is just okay. If you are still confused about it, feel free to contact me.