vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Track expert selection metrics

Open Ryp opened this issue 6 months ago • 2 comments

Purpose

The goal is to track expert unbalance and get those metrics available in prometheus

How to use

Use VLLM_COLLECT_EXPERT_USAGE_HISTOGRAM=1 to enable this feature. Make sure that PROMETHEUS_MULTIPROC_DIR is set to get proper metrics!

The moe_expert_selection metric will then be available in prometheus at runtime.

Performance considerations

Expect a 2% maximum e2e overhead when running this! Perf on the GPU side is negligible. Note that this PR does enable anything by default, so perf is untouched this way.

Ryp avatar Jun 20 '25 16:06 Ryp

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Jun 20 '25 16:06 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Ryp.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Jun 20 '25 16:06 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Ryp.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Jun 25 '25 19:06 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Ryp.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Jun 27 '25 13:06 mergify[bot]

We just merged EPLB from @abmfy (cc @WoosukKwon). Please rebase and we would love to expose this core metrics!

simon-mo avatar Jul 01 '25 04:07 simon-mo

Ready for review!

Ryp avatar Jul 04 '25 15:07 Ryp

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Ryp.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Jul 08 '25 15:07 mergify[bot]

Sorry for the delay in reviewing. After https://github.com/vllm-project/vllm/pull/20562 is merged, this could be enabled using that config instead of with an environment variable.

hmellor avatar Jul 29 '25 13:07 hmellor

@Ryp hi, can you provide a metric info about request /metric api.

lengrongfu avatar Jul 30 '25 06:07 lengrongfu

Does this method support tp>1? Is communication aggregation heat required when tp>1? Can it support recording physical expert heat? If physical experts are transmitted, it may also be necessary to transmit physical experts to logical experts.

baxingpiaochong avatar Aug 04 '25 06:08 baxingpiaochong

Hello, is this still active? I believe EPLB could leverage the expert selection metrics collection from this PR to avoid duplicating efforts. Should we consider refactoring it to make it compatible with EPLB?

abmfy avatar Aug 14 '25 23:08 abmfy

Hello, is this still active? I believe EPLB could leverage the expert selection metrics collection from this PR to avoid duplicating efforts. Should we consider refactoring it to make it compatible with EPLB?

Hello, I'm also interested in EPLB metrics. Does PR currently support TP > 1? When TP > 1, an all-gather communication is required.

baxingpiaochong avatar Aug 15 '25 02:08 baxingpiaochong

@mickaelseznec Will take over this PR - news will come from him. Thanks

Ryp avatar Aug 15 '25 07:08 Ryp

Expert selection tracking is moved to https://github.com/vllm-project/vllm/pull/27105.

PatrykSaffer avatar Oct 29 '25 08:10 PatrykSaffer