Enhance expert load balancing logging

Open yanring opened this issue 1 month ago • 0 comments

Is your feature request related to a problem? Please describe. Cross-layer statistics: Compute a scalar balance metric per layer, then log percentiles (p50, p75, p90, p95) across all layers Within-layer expert distribution: For each layer, log statistics of how tokens are distributed across experts (min, max, std, percentiles of tokens_per_expert)

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Nov 25 '25 15:11 yanring