japanese-dialog-transformers icon indicating copy to clipboard operation
japanese-dialog-transformers copied to clipboard

Could you release your fine-tuning scripts in your experiment?

Open meguruin opened this issue 2 years ago • 0 comments

Thank you for your excellent work. I have one question about fine-tuning your pre-trained model.

I am trying to reproduce the same results in your paper but am not able to get the same perplexity on JEmpatheticDialogues and JPersonaChat.

My fine-tuned model gets 30.87 ppl, which is much worse than 21.32 ppl in your Finetuned model with JPersonaChat. I also tried some seeds, but I could not improve the perplexity.

Here is my code for fine-tuning JPersonaChat. (Before fine-tuning, I tokenized and processed the dataset by using tokenized.sh and fairseq-preprocess)

export CUDA_VISIBLE_DEVICES=0,1 fairseq-train $DATA_DIR \
    --arch transformer \
    --finetune-from-model $PRETEAINED_MODEL \
    --task translation \
    --save-dir $SAVE_DIR \
    --dropout 0.1 \
    --encoder-normalize-before \
    --encoder-embed-dim 1920 \
    --encoder-ffn-embed-dim 7680 \
    --encoder-layers 2 \
    --decoder-normalize-before \
    --decoder-embed-dim 1920 \
    --decoder-ffn-embed-dim 7680 \
    --decoder-layers 24 \
    --criterion cross_entropy \
    --batch-size 4 \
    --lr 0.0001 \
    --max-update 3000 \
    --warmup-updates 3000 \
    --optimizer adafactor \
    --best-checkpoint-metric loss \
    --keep-best-checkpoints 1 \
    --tensorboard-logdir $LOG_DIR \
    --update-freq 32 \
    --keep-interval-updates 5 \
    --seed 0 \
    --ddp-backend no_c10d 

Could you share your fine-tuning recipes in your experiment? (Sorry if I have overlooked) I look forward to hearing from you.

meguruin avatar Dec 12 '22 06:12 meguruin