[FEATURE]: Expert Parallel for qwen/deepseek

Open Guodanding opened this issue 11 months ago • 4 comments

Describe the feature

Hello, are there any existing implementations of expert parallel code for the new MoE model, like qwen and deepseek?

Jan 12 '25 14:01 Guodanding

need FP8 training deepseek-MOE

Feb 19 '25 01:02 shiyongde

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

need FP8 training deepseek-MOE

Feb 19 '25 01:02 Issues-translate-bot

EP for Deepseek V3 is implemented, see our latest blog.

Feb 20 '25 04:02 ver217

need FP8 training deepseek-MOE

FP8 gemm kernel released by deepseek github repo now is less efficient than BF16 gemm provided by cublas sometimes. We will release blockwise FP8 training feature until we resolve the efficiency issue.

Feb 20 '25 04:02 ver217