smoe Why block the gradients of smoe gate network?

Why block the gradients of smoe gate network?

Open KennyNH opened this issue 11 months ago • 0 comments

https://github.com/spcl/smoe/blob/249ef673d1929a23e5fe7c2628e1299b8c1c2e42/smoe/models/smoe_routing.py#L116

Why should "smoe_config.block_gate_grad" be set as "True" and let "grad_routing_weights=None" which cut the gradients of gating network? So how does the routing parameters in "SpatialLatentTensorGate2d" optimize?

Feb 29 '24 14:02 KennyNH

smoe smoe copied to clipboard

Why block the gradients of smoe gate network?

smoe
smoe copied to clipboard