conversation-tensorflow icon indicating copy to clipboard operation
conversation-tensorflow copied to clipboard

Significant overfit on default hyperparameters on cornell-movie-dialogs config

Open lk251 opened this issue 7 years ago • 2 comments

Running: python main.py --config cornell-movie-dialogs --mode train

to the end (100000 steps) will result in a training loss of about 2.6, test loss of 8.4.

Which hyperparameters did you use? The resulting chatbot doesn't work very well (the one in your readme is a lot better).

Thank you!

lk251 avatar Mar 11 '18 23:03 lk251

Seq2Seq model has a problem that there is a gap between training and inference. You can check practical tips for training sequence-to-sequence models with attention in this blog

I don't remember the model's detail because I worked on it before. I guess, The result in readme is more overfitting than yours. (200,000 steps)

And cornell-movie-dialogs is too small to train conversation model. Lack of data means that overfitting is inevitable.

DongjunLee avatar Mar 12 '18 04:03 DongjunLee

Thanks for the advice!

lk251 avatar Mar 12 '18 12:03 lk251