japanese-dialog-transformers
japanese-dialog-transformers copied to clipboard
Could you release your fine-tuning scripts in your experiment?
Thank you for your excellent work. I have one question about fine-tuning your pre-trained model.
I am trying to reproduce the same results in your paper but am not able to get the same perplexity on JEmpatheticDialogues
and JPersonaChat
.
My fine-tuned model gets 30.87 ppl, which is much worse than 21.32 ppl in your Finetuned model with JPersonaChat. I also tried some seeds, but I could not improve the perplexity.
Here is my code for fine-tuning JPersonaChat
. (Before fine-tuning, I tokenized and processed the dataset by using tokenized.sh
and fairseq-preprocess
)
export CUDA_VISIBLE_DEVICES=0,1 fairseq-train $DATA_DIR \
--arch transformer \
--finetune-from-model $PRETEAINED_MODEL \
--task translation \
--save-dir $SAVE_DIR \
--dropout 0.1 \
--encoder-normalize-before \
--encoder-embed-dim 1920 \
--encoder-ffn-embed-dim 7680 \
--encoder-layers 2 \
--decoder-normalize-before \
--decoder-embed-dim 1920 \
--decoder-ffn-embed-dim 7680 \
--decoder-layers 24 \
--criterion cross_entropy \
--batch-size 4 \
--lr 0.0001 \
--max-update 3000 \
--warmup-updates 3000 \
--optimizer adafactor \
--best-checkpoint-metric loss \
--keep-best-checkpoints 1 \
--tensorboard-logdir $LOG_DIR \
--update-freq 32 \
--keep-interval-updates 5 \
--seed 0 \
--ddp-backend no_c10d
Could you share your fine-tuning recipes in your experiment? (Sorry if I have overlooked) I look forward to hearing from you.