vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: MLA Support

Open chengtbf opened this issue 9 months ago • 4 comments

🚀 The feature, motivation and pitch

DeepSeek-V2 design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.

image

Can VLLM support MLA for accelerated inference?

@misc{deepseek-v2, author = {DeepSeek-AI}, title = {DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model}, year = {2024}, note = {GitHub repository}, url = {https://github.com/deepseek-ai/deepseek-v2} }

https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat https://github.com/deepseek-ai/DeepSeek-V2/blob/main/deepseek-v2-tech-report.pdf

Alternatives

No response

Additional context

No response

chengtbf avatar May 06 '24 15:05 chengtbf

mark

datalee avatar May 07 '24 00:05 datalee

mark

obitoquilt avatar May 20 '24 01:05 obitoquilt

mark

chenrui17 avatar Jun 12 '24 09:06 chenrui17

mark

lengyueyang avatar Jul 10 '24 09:07 lengyueyang

mark

RanchiZhao avatar Jul 11 '24 04:07 RanchiZhao

mark

Jiayi-Pan avatar Jul 11 '24 22:07 Jiayi-Pan

mark

quwu0820 avatar Jul 13 '24 07:07 quwu0820

mark

qianchen94 avatar Jul 16 '24 09:07 qianchen94

mark

lumosity4tpj avatar Aug 15 '24 06:08 lumosity4tpj

ref https://github.com/vllm-project/vllm/pull/4650#issuecomment-2297051077

zhyncs avatar Aug 19 '24 17:08 zhyncs