torchtune
torchtune copied to clipboard
Loss becomes NaN during finetuning when turning on optimizer_in_bwd=True
I am running full finetuning, and the loss becomes NaN immediately at the 2nd iteration when turning on optimizer_in_bwd. When turning off optimizer_in_bwd, the training seems running smoothly and loss is going down.
Is there any suggestions on what's broken about optimizer_in_bwd? Thanks a lot
Can you provide a command to reproduce this?
Can you provide a command to reproduce this?
This only happens when running the custom model. I tried to reproduce in llama3.2 but it works with optimizer_in_bwd. Do you know what do we need to handle specially for optimizer_in_bwd to work?