fastmoe
fastmoe copied to clipboard
how to use balance loss?
how to apply balance loss? can u add it to the example 'transformer-xl'?
Sorry for the late reply.
The BaseGate
module has methods including set_loss
, get_loss
and has_loss
. In a customized gate (or gates in FastMoE with balance losses), they use self.set_loss
to put the loss value in the module, which can be further added to the final loss using get_loss
function of the gate modules. (e.g. adding them to get_loss
function in Megatron-LM)
We will add this to our document.