DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.15 - the loss during training and at end differs substantially

Open KemalDrop opened this issue 10 months ago • 0 comments

As shown in below during training DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.06, also in another try with another dataset still before the end of the epoch training loss drops to 0.0 and eval loss follows 2.14e-6.

Using a different dataset but other than that the implementation is mostly the same with https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/scripts/dpo/run_dpo.py

Feb 19 '25 20:02 KemalDrop