sglang
sglang copied to clipboard
Expert Parallelism (EP) Support for DeepSeek V3/R1
Motivation
Expert Parallelism (EP) Support for DeepSeek V3/R1。
Modifications
- the group GEMM operator supports FP8
- supports DeepSeek V3 parameter loading.
Performence
The performance improved by approximately 5% on a single H200 machine.
H200*8 | Input token throughput (tok/s) | Output token throughput (tok/s) |
---|---|---|
EP=8 | 677.82 | 1468.48 |
TP=8 | 647.60 | 1403.02 |
test command
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --enable-ep-moe
python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 600 --random-input 180 --random-output 400 --request-rate 40