Ziteng Wang
Results
1
issues of
Ziteng Wang
**Describe the bug** Load balancing loss is accumulated twice when using activation checkpointing **To Reproduce** Train from scratch with / without `--moe-layer-recompute`, setting `--moe-router-load-balancing-type aux_loss` **Expected behavior** Load balancing loss...
stale