[ENHANCEMENT] Global Batch Load Balancing for MoE Models
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like Implement global-batch level load balancing.
Benefits Based on the paper
- Improves pre-training perplexity by ~0.1
- Increases benchmark scores by ~2 points
- Enables interpretable domain specialization of experts
Thanks! Let us take a deeper look @Victarry
We will add this feature to MCore v0.13. The ETA is end of this month.
Thank you for reviewing and accepting this feature request! I greatly appreciate your comprehensive support and the ongoing development and maintenance of Megatron-LM.
Marking as stale. No activity in 60 days.
Finished with 72d23540d0358ae24a41ff289d1461b094a770fa
Thanks very much, I’ll try it out right away!