sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Expert Parallelism (EP) Support for DeepSeek V3/R1

Open sleepcoo opened this issue 1 week ago • 0 comments

Motivation

Expert Parallelism (EP) Support for DeepSeek V3/R1。

Modifications

  • the group GEMM operator supports FP8
  • supports DeepSeek V3 parameter loading.

Performence

The performance improved by approximately 5% on a single H200 machine.

H200*8 Input token throughput (tok/s) Output token throughput (tok/s)
EP=8 677.82 1468.48
TP=8 647.60 1403.02

test command

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code 
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --enable-ep-moe 
 python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 600 --random-input 180 --random-output 400 --request-rate 40

sleepcoo avatar Feb 16 '25 06:02 sleepcoo