DeepSpeedExamples Why is the PPL so high in the beginning of Step-1 (SFT)?

Why is the PPL so high in the beginning of Step-1 (SFT)?

Open siddharth9820 opened this issue 2 years ago • 0 comments

https://github.com/microsoft/DeepSpeedExamples/blob/737c6740bec38b77a24a59135b6481a53d566b38/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_log_output/opt-1.3b-globalBatchSize128.log#L4

Why is the PPL here 4k when we are starting with a pretrained model?

Oct 28 '23 03:10 siddharth9820

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Why is the PPL so high in the beginning of Step-1 (SFT)?

DeepSpeedExamples
DeepSpeedExamples copied to clipboard