acisseJZhong

Results 2 issues of acisseJZhong

I am running full finetuning, and the loss becomes NaN immediately at the 2nd iteration when turning on optimizer_in_bwd. When turning off optimizer_in_bwd, the training seems running smoothly and loss...

# [RFC] MOE design in Torchtune ## Background This RFC proposes adding the MOE support in Torchtune. We want to design in a general way so that components can be...

CLA Signed
rfc