transformer-xl Penn Treebank and WikiText-2 architectures

Penn Treebank and WikiText-2 architectures

Open AlexGrinch opened this issue 6 years ago • 1 comments

Hello!

Could you, please, provide hyperparameters for training models with close to SOTA perplexity on PTB and WT2 (if you experimented with the latter, as it has the corresponding choice in data utils)? Am I right that two changes I need to make to the released code is to add variational dropout and ASGD optimizer? If you have a code which produces necessary changes, it would be great.

Thanks

Feb 19 '19 17:02 AlexGrinch

Did you find hyperparams for PTB? I only reached 68 in test without variational dropout and weight averaging. But only with 14m params.

Jul 02 '20 06:07 SaoYear

transformer-xl transformer-xl copied to clipboard

Penn Treebank and WikiText-2 architectures

transformer-xl
transformer-xl copied to clipboard