acisseJZhong issues

Repositories
Issues
Comments

Results 2 issues of


                                            acisseJZhong

Loss becomes NaN during finetuning when turning on optimizer_in_bwd=True

I am running full finetuning, and the loss becomes NaN immediately at the 2nd iteration when turning on optimizer_in_bwd. When turning off optimizer_in_bwd, the training seems running smoothly and loss...

[RFC] MOE design in Torchtune

# [RFC] MOE design in Torchtune ## Background This RFC proposes adding the MOE support in Torchtune. We want to design in a general way so that components can be...

CLA Signed

rfc