sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Feature] Expert parallelism support

Open chongli-uw opened this issue 1 year ago • 1 comments

Checklist

  • [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [x] 2. Please use English, otherwise it will be closed.

Motivation

Hi team, First of all thanks so much for such a great project. I am wondering if there is plan to support Expert Parallelism for MoE models?

Related resources

https://nvidia.github.io/TensorRT-LLM/advanced/expert-parallelism.html

chongli-uw avatar Sep 16 '24 06:09 chongli-uw

https://github.com/sgl-project/sglang/blob/441c22db8cbcb005b5f005b991e8aa1a65d79bb6/python/sglang/srt/models/mixtral_quant.py#L86-L150

this is an early example

merrymercy avatar Sep 22 '24 11:09 merrymercy

https://github.com/sgl-project/sglang/blob/441c22db8cbcb005b5f005b991e8aa1a65d79bb6/python/sglang/srt/models/mixtral_quant.py#L86-L150

this is an early example

@merrymercy Hi, any progress has been made on this issue? The example you provided previously didn't use FusedMOE but mlp. How can we enable Expert Parallel with the current Mixtral/DeepSeek-v2 after using FusedMOE? Do you have a modified example?

liangzelang avatar Nov 14 '24 06:11 liangzelang

related #1970

merrymercy avatar Nov 14 '24 19:11 merrymercy

related #1970

@merrymercy I see that this issue is mainly related to TP and DP. I noticed that the SGLang Q4 roadmap #1487 mentioned supporting this feature.

liangzelang avatar Nov 15 '24 06:11 liangzelang

@liangzelang DP has already been merged(only for DeepSeek right now) https://github.com/sgl-project/sglang/pull/1970 and EP will be supported soon cc @ispobock

zhyncs avatar Nov 17 '24 16:11 zhyncs

@liangzelang DP has already been merged(only for DeepSeek right now) #1970 and EP will be supported soon cc @ispobock

@zhyncs Does MoE-EP have any support? I have implemented MoE-EP.

xiaobochen123 avatar Nov 21 '24 08:11 xiaobochen123

Does MoE-EP have any support? I have implemented MoE-EP.

@xiaobochen123 We are going to implement it with a DP + EP approach for throughput gains. Currently, DP attention is implemented. Before we start the EP, some updates to the MoE codebase should be done.

I am interested in what kind of MoE-EP did you implement and what codebase did you use? How much are the performance gains compared to TP?

ispobock avatar Nov 21 '24 15:11 ispobock

done by https://github.com/sgl-project/sglang/pull/2203

merrymercy avatar Dec 28 '24 03:12 merrymercy