Optimus Hyper-parameters to reproduce language modelling results

Hyper-parameters to reproduce language modelling results

Open ghazi-f opened this issue 3 years ago • 0 comments

Thank you for this great repo ! I was trying to use it for language modeling but I couldn't find, amongst the checkpoints you provide, any model that performed well in terms of perplexity. I measure perplexity on your SNLI test set with code/examples/big_ae/run_lm_vae_training.py by setting the --do_eval option (and without the --do_train option). This yielded high KL (~2000) for all the checkpoints you provide.

I tried finetuning a wikipedia checkpoint with your script on SNLI but I only get the following results:

with high beta (1.0) and low r0 (0.1): perplexity in the order of 30 with KL around 10 and and mutual info ~0.2
with low beta (0.5) and high r0 (0.5): perplexity in the order of 1000 with KL around 75 and mutual info ~1.5

I can't seem to get it to have low perplexity with high mutual information. Could you provide a language modeling checkpoint or just specify the hyper-parameters and wikipedia pretrained model used to produce the results in the paper ?

Thank you very much for your help !

Feb 04 '22 17:02 ghazi-f

Optimus Optimus copied to clipboard

Hyper-parameters to reproduce language modelling results

Optimus
Optimus copied to clipboard