transfer-learning-conv-ai
transfer-learning-conv-ai copied to clipboard
Questions about ppl when using gpt2
Hi! I meet some problems when running ConvAI2 evaluation scripts:
I first trained a model from OpenAI GPT. I increased the number of cumulative gradients because I only have one card.
python train.py --model_checkpoint /path/to/pretrained/gpt \
--gradient_accumulation_steps=32 --lm_coef=2.0 --max_history=2 \
--n_epochs=1 --num_candidates=4 --personality_permutations=2 \
--train_batch_size=2 --valid_batch_size=2
this gives ConvAI2 evalution results:
Final Hits@1: 0.761
FINAL F1: 0.1659
FINAL PPL: 20.7
Then I tried to train from GPT2-small with the same config:
python train.py --model_checkpoint /path/to/pretrained/gpt2 \
--gradient_accumulation_steps=32 --lm_coef=2.0 --max_history=2 \
--n_epochs=1 --num_candidates=4 --personality_permutations=2 \
--train_batch_size=2 --valid_batch_size=2
and the evaluation results are:
Final Hits@1: 0.737
FINAL F1: 0.1643
FINAL PPL: 178.9
The command I used to run convai_evalution.py is:
python convai_evaluation.py --eval_type ppl --model_checkpoint /path/to/finetuned/model
The ppl of GPT2 is strangely high.
Is there anything that needs to be modified when testing finetuned-gpt2 with convai_evalution.py?
I'm also curious about the best test results and hyperparameters when you finetuned from GPT2. Thank you!
Did you find out how to achieve the best results? I have the same problem with GPT-2, which drives to a final ppl of 133.