deep-learning-pytorch-huggingface
deep-learning-pytorch-huggingface copied to clipboard
DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.15 - the loss during training and at end differs substantially
As shown in below during training DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.06, also in another try with another dataset still before the end of the epoch training loss drops to 0.0 and eval loss follows 2.14e-6.
Using a different dataset but other than that the implementation is mostly the same with https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/scripts/dpo/run_dpo.py