[Feature] Support DeepSeek-V2 Model
Motivation
@lvhan028 @grimoire @lzhangzz Do you have plan to support DeepSeek-V2 Model?
Related resources
No response
Additional context
No response
Support of Deepseek v2 might take long time.
- MoE model has not been optimized. I am working on a fused kernel like vllm
- The model has so many weights, that loading the model and distributing it to ranks takes too much time. I need to find a new way to load tp model. That might lead to another refactor.
Support of Deepseek v2 might take long time.
In addition, DeepSeek V2 uses MLA(Multi-head Latent Attention) instead of traditional MHA or GQA. To achieve the best performance, it will also take some time.
Hi Genius Li @lzhangzz Due to its excellent performance and effectiveness, DeepSeek V2 has become a new round of SOTA model. Does TurboMind have plans to support it? I know that supporting this on TurboMind is far more complex than PyTorch, but if possible, the performance would be very impressive.
Mark! I am looking forward to integrating the deepseek-v2 model into the turbomind engine. The performance of deepseek-v2 is excellent!
@Vincent131499 We conducted an internal survey on MLA. Implementing it according to the method in the paper requires modifications to the kernel, which involves a significant amount of engineering work. If interested, you can refer to the blog for the derivation process.