transformer-xl Training with wordpiece/bpe vocab

Training with wordpiece/bpe vocab

Open deep-speech opened this issue 6 years ago • 2 comments

trafficstars

I am trying training with a fixed vocab(10k bpe symbols). I tried with auto-generated bpe vocab as well. The model doesn't converge. Are there any other considerations to be taken care of? Initially there was an issue with cutoffs, I made cutoffs=[], still facing the issue with model convergence.

May 03 '19 15:05 deep-speech

This seems to be an issue of hyper-parameter tuning. Try using a larger warm up steps, reducing the learning rate, or setting div_val to 1.

May 09 '19 12:05 kimiyoung

I tried with div_val to 1 and smaller learning rate and the model is converging. But it seems to be overfitting, while the train pplx is ~23 after 45k steps eval pplx is around 210.

May 09 '19 12:05 deep-speech

transformer-xl transformer-xl copied to clipboard

Training with wordpiece/bpe vocab

transformer-xl
transformer-xl copied to clipboard