alignment-handbook
alignment-handbook copied to clipboard
Weird DPO loss
Hi, I would like to raise some attention to issue #38.
It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)
In the mean time, the full parameter fine tunning has no such problem (official settings).
I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?