vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Performance]: When deployed DeepSeek-V3 on 8*H20(96GB), maximum model length only reaches 6500 using vllm, but with sglang can achieve 163840.

Open zhaotyer opened this issue 3 weeks ago • 8 comments

Proposal to improve performance

No response

Report of performance regression

No response

Misc discussion on performance

vllm command python3 -m vllm.entrypoints.openai.api_server --model ${model_path} --port 8108 --max-model-len 6500 --gpu-memory-utilization 0.98 --tensor-parallel-size 8 --trust-remote-code --quantization fp8 sglang command python3 -m sglang.launch_server --model-path ${model_path} --tp 8 --trust-remote-code --mem-fraction-static 0.9 --host 0.0.0.0 --port 50050 --served-model-name atom

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

zhaotyer avatar Feb 07 '25 11:02 zhaotyer