[Usage]: There is a significant difference in memory usage between vllm and sglang when deploying deepseek-v2 fp8.

Open fengyang95 opened this issue 1 year ago • 0 comments

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I deployed the fp8 quantized model of deepseek-v2.5 using vllm and sglang respectively, and found that vllm's memory consumption is significantly higher than sglang's (especially the memory available for kv cache). Is this mainly due to the differences between MHA and MLA?

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Sep 13 '24 09:09 fengyang95