acisseJZhong

Results 2 comments of acisseJZhong

> Can you provide a command to reproduce this? This only happens when running the custom model. I tried to reproduce in llama3.2 but it works with optimizer_in_bwd. Do you...

> I would try to either make the experts entirely routing agnostic (not sure if this is possible, based on your code it seems to affect the forward quite significantly),...