Optimus
Optimus copied to clipboard
Hyper-parameters to reproduce language modelling results
Thank you for this great repo !
I was trying to use it for language modeling but I couldn't find, amongst the checkpoints you provide, any model that performed well in terms of perplexity. I measure perplexity on your SNLI test set with code/examples/big_ae/run_lm_vae_training.py by setting the --do_eval option (and without the --do_train option). This yielded high KL (~2000) for all the checkpoints you provide.
I tried finetuning a wikipedia checkpoint with your script on SNLI but I only get the following results:
- with high beta (1.0) and low r0 (0.1): perplexity in the order of 30 with KL around 10 and and mutual info ~0.2
- with low beta (0.5) and high r0 (0.5): perplexity in the order of 1000 with KL around 75 and mutual info ~1.5
I can't seem to get it to have low perplexity with high mutual information. Could you provide a language modeling checkpoint or just specify the hyper-parameters and wikipedia pretrained model used to produce the results in the paper ?
Thank you very much for your help !