Knover icon indicating copy to clipboard operation
Knover copied to clipboard

Fine-tune PLATO-2

Open sdai654416 opened this issue 2 years ago • 2 comments

  1. I download the 24L model, and run the finetune script bash ./scripts/local/job.sh ./projects/PLATO-2/finetune/24L_train.conf. I got nan for my loss at the very beginning of the fine-tuning. Am I missing any stages?

  2. If I do the pre-train script, the pretrain stage 1 does not store anything in the output/. I assume stage 2.1, and stage 2.2, requires stage 1's output right? How do I store stage 1? Thanks

Thanks!

sdai654416 avatar Apr 12 '22 21:04 sdai654416

You can change: AMP setting in knover/core/model.py https://github.com/PaddlePaddle/Knover/blame/develop/knover/core/model.py#L165

"custom_white_list": ["gelu"],

It seems that old models need to disable fp16 softmax / layer_norm. Thanks for feedback!

sserdoubleh avatar Apr 13 '22 03:04 sserdoubleh

As for your second question, I think the pre-training data is too small and the number of steps saved is too large. As a result, the training ends before the set number of steps is reached. Save_steps can be modified in/"projects/ PLATe-2 /pretrain/ 24l_train_stage-1.conf"

py703703 avatar May 03 '22 10:05 py703703