vllm [Bugfix] Fix deepseekv3 grouped topk error

Fix the logic in grouped topk computation of fused moe.

There is a slight difference between vllm grouped_topk and the official code. When the newly-introduced bias term (e_score_correction_bias in vllm) is not None, we should firstly get the top-2 scores of each group, and use the summation to get top-k groups. You can also check the compute logic in DeepSeek v3's official inference code

Feb 18 '25 09:02 Chen-XiaoBing

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Feb 18 '25 09:02 github-actions[bot]

Thank you for your contribution!

I ran a quick eval on gsm8k with DeepSeek R1 and see that this PR seems to slightly regress performance, however it is within stderr.

(vllm) ➜  vllm git:(main) lm_eval --model vllm --model_args pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.956|±  |0.0056|
|     |       |strict-match    |     5|exact_match|↑  |0.956|±  |0.0056|

(vllm) ➜  vllm git:(fix-dsv3-grouped-topk) lm_eval --model vllm --model_args pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9507|±  | 0.006|
|     |       |strict-match    |     5|exact_match|↑  |0.9507|±  | 0.006|

I will try to run harder evals to determine improvement. Would you have an example of bad performance from DeepSeek on main due to this issue?

Feb 18 '25 17:02 mgoin

@mgoin, we might need to test V3 model as it is more trained for GSM and MMLU.

Feb 18 '25 18:02 simon-mo

Here is MMLU for R1, similar small drops for each category - I'll try to get V3 going on another machine

(vllm) ➜  vllm git:(main) lm_eval --model vllm --model_args pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8,max_model_len=2048,gpu_memory_utilization=0.8 --trust_remote_code --tasks mmlu --batch_size 16
vllm (pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8,max_model_len=2048,gpu_memory_utilization=0.8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.8514|±  |0.0029|
| - humanities     |      2|none  |      |acc   |↑  |0.7845|±  |0.0057|
| - other          |      2|none  |      |acc   |↑  |0.8822|±  |0.0055|
| - social sciences|      2|none  |      |acc   |↑  |0.9227|±  |0.0047|
| - stem           |      2|none  |      |acc   |↑  |0.8513|±  |0.0062|

(vllm) ➜  vllm git:(fix-dsv3-grouped-topk) lm_eval --model vllm --model_args pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8,max_model_len=2048,gpu_memory_utilization=0.8 --trust_remote_code --tasks mmlu --batch_size 16
vllm (pretrained=/home/vllm-dev/DeepSeek-R1,tensor_parallel_size=8,max_model_len=2048,gpu_memory_utilization=0.8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.8492|±  |0.0029|
| - humanities     |      2|none  |      |acc   |↑  |0.7824|±  |0.0057|
| - other          |      2|none  |      |acc   |↑  |0.8806|±  |0.0055|
| - social sciences|      2|none  |      |acc   |↑  |0.9194|±  |0.0048|
| - stem           |      2|none  |      |acc   |↑  |0.8493|±  |0.0062|

Feb 18 '25 19:02 mgoin

I'll release v0.7.3 after this PR is merged.

Feb 20 '25 06:02 simon-mo

@simon-mo Some checks in the CI pipeline have failed. Would you kindly assist with merging the code?

Feb 20 '25 07:02 Chen-XiaoBing