acisseJZhong
Results
2
comments of
acisseJZhong
> Can you provide a command to reproduce this? This only happens when running the custom model. I tried to reproduce in llama3.2 but it works with optimizer_in_bwd. Do you...
> I would try to either make the experts entirely routing agnostic (not sure if this is possible, based on your code it seems to affect the forward quite significantly),...