alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

Weird DPO loss

Open ChenDRAG opened this issue 1 year ago • 1 comments

Hi, I would like to raise some attention to issue #38.

It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)

In the mean time, the full parameter fine tunning has no such problem (official settings).

image

I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?

ChenDRAG avatar Nov 24 '23 03:11 ChenDRAG