Xiangyu Li
Results
1
issues of
Xiangyu Li
### What does this PR do? When training MoE-family models with Megatron as the backend, enabling Expert Parallelism (EP) may cause load imbalance across experts, which makes the update_actor step...