Ziteng Wang

Results 1 issues of Ziteng Wang

**Describe the bug** Load balancing loss is accumulated twice when using activation checkpointing **To Reproduce** Train from scratch with / without `--moe-layer-recompute`, setting `--moe-router-load-balancing-type aux_loss` **Expected behavior** Load balancing loss...

stale