alignment-handbook Weird DPO loss

Weird DPO loss

Open ChenDRAG opened this issue 1 year ago • 1 comments

Hi, I would like to raise some attention to issue #38.

It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)

In the mean time, the full parameter fine tunning has no such problem (official settings).

I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?

Nov 24 '23 03:11 ChenDRAG

alignment-handbook alignment-handbook copied to clipboard

Weird DPO loss

alignment-handbook
alignment-handbook copied to clipboard