DialogWAE Unable to achieve published result in DailyDialogue

Hi, I am trying to retrain your model as a baseline, and till now SWDA gave the results as per the paper. actually, slightly better. But for the DailyDialog dataset, even after multiple runs the best we got is, (row1 is no validation, row2 on test set

A, E, G are for sim_bow BLEU-R | BLEU-P | F1 | A | E | G 0.305 | 0.170 | 0.218 | 0.940 | 0.609 | 0.857 0.298 | 0.163 | 0.211 | 0.940 | 0.605 | 0.857

Whereas the paper mentions the best results to be

Was there any changes made to the code with respect to the configuration in the paper? I couldn't find any discrepancy. Can you point me to what might be the issue?

May 14 '19 11:05 bsantraigi

Thanks for pointing out. There seems to be a big deviation to the original results since recently. Somebody reported better results than that reported in the paper for the DailyDial dataset. We are not sure whether it is due to any change of environment other than those written in the "requirements.txt". We are figuring it out and will let you know.

May 16 '19 11:05 guxd

Ok, thanks. Although we were using an environment as per the requirements.txt only. Also like you said, we also noticed quite a bit of variance between different runs. (Even when the seed is given as an argument)

May 17 '19 13:05 bsantraigi

Also, in #2341 it was shown that the NLTK lib has some issues, related to the SmoothingFunction() and therefore received an update to fix it. Hence, it is no longer possible to achieved the same results.

Aug 04 '20 15:08 Bortrex