Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Enhance expert load balancing logging

Open yanring opened this issue 1 month ago • 0 comments

Is your feature request related to a problem? Please describe. Cross-layer statistics: Compute a scalar balance metric per layer, then log percentiles (p50, p75, p90, p95) across all layers Within-layer expert distribution: For each layer, log statistics of how tokens are distributed across experts (min, max, std, percentiles of tokens_per_expert)

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

yanring avatar Nov 25 '25 15:11 yanring