sglang
sglang copied to clipboard
[ROCM MOE] Enable ROCM AITER Block MOE For DeepSeek R1/V3
Motivation
This PR introduces the concept of aiter (https://github.com/ROCm/aiter) Fused Block MOE kernel on ROCm. To use this feature one has to use the env variable : SGLANG_ROCM_AITER_BLOCK_MOE=1.
The new moe kernel would bring a 10 ~ 30% uplift for different isl/osl.
Prerequisite
clone
git clone --recursive https://github.com/ROCm/aiter.git or git submodule sync ; git submodule update --init --recursive
install into python
under aiter root dir run: python3 setup.py develop
Usage
JIT compiler compiles the operator which is calling.
Modifications
- Add block scale aiter moe in fused_moe_triton/fused_moe.py
- Add weights shuffle in fused_moe_triton/layers
Checklist
- [ ] Format your code according to the Code Formatting with Pre-Commit.
- [ ] Add unit tests as outlined in the Running Unit Tests.
- [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
- [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
- [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
- [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.