mesh
mesh copied to clipboard
Does load-balanced loss help the loss converge?
https://github.com/tensorflow/mesh/blob/fbf7b1e547e8b8cb134e81e1cd350c312c0b5a16/mesh_tensorflow/transformer/moe.py#L935
I try load-balanced loss in my project and find load-balanced loss does not help loss converge.
Does it only balance the load, but does not help the loss convergence, or even slightly hurt the model?