Loss becomes NaN during finetuning when turning on optimizer_in_bwd=True

Open acisseJZhong opened this issue 1 year ago • 2 comments

I am running full finetuning, and the loss becomes NaN immediately at the 2nd iteration when turning on optimizer_in_bwd. When turning off optimizer_in_bwd, the training seems running smoothly and loss is going down.

Is there any suggestions on what's broken about optimizer_in_bwd? Thanks a lot

Dec 10 '24 19:12 acisseJZhong

Can you provide a command to reproduce this?

Dec 10 '24 20:12 pbontrager

Can you provide a command to reproduce this?

This only happens when running the custom model. I tried to reproduce in llama3.2 but it works with optimizer_in_bwd. Do you know what do we need to handle specially for optimizer_in_bwd to work?

Dec 10 '24 23:12 acisseJZhong