lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Feature] Support DeepSeek-V2 Model

Open ispobock opened this issue 1 year ago • 5 comments

Motivation

@lvhan028 @grimoire @lzhangzz Do you have plan to support DeepSeek-V2 Model?

Related resources

No response

Additional context

No response

ispobock avatar May 08 '24 02:05 ispobock

Support of Deepseek v2 might take long time.

  • MoE model has not been optimized. I am working on a fused kernel like vllm
  • The model has so many weights, that loading the model and distributing it to ranks takes too much time. I need to find a new way to load tp model. That might lead to another refactor.

grimoire avatar May 09 '24 11:05 grimoire

Support of Deepseek v2 might take long time.

In addition, DeepSeek V2 uses MLA(Multi-head Latent Attention) instead of traditional MHA or GQA. To achieve the best performance, it will also take some time.

zhyncs avatar May 10 '24 02:05 zhyncs

Hi Genius Li @lzhangzz Due to its excellent performance and effectiveness, DeepSeek V2 has become a new round of SOTA model. Does TurboMind have plans to support it? I know that supporting this on TurboMind is far more complex than PyTorch, but if possible, the performance would be very impressive.

zhyncs avatar May 10 '24 05:05 zhyncs

Mark! I am looking forward to integrating the deepseek-v2 model into the turbomind engine. The performance of deepseek-v2 is excellent!

Vincent131499 avatar May 11 '24 10:05 Vincent131499

@Vincent131499 We conducted an internal survey on MLA. Implementing it according to the method in the paper requires modifications to the kernel, which involves a significant amount of engineering work. If interested, you can refer to the blog for the derivation process.

zhyncs avatar May 16 '24 06:05 zhyncs