sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[ROCM MOE] Enable ROCM AITER Block MOE For DeepSeek R1/V3

Open BruceXcluding opened this issue 10 months ago • 0 comments

Motivation

This PR introduces the concept of aiter (https://github.com/ROCm/aiter) Fused Block MOE kernel on ROCm. To use this feature one has to use the env variable : SGLANG_ROCM_AITER_BLOCK_MOE=1.

The new moe kernel would bring a 10 ~ 30% uplift for different isl/osl.

Prerequisite

clone git clone --recursive https://github.com/ROCm/aiter.git or git submodule sync ; git submodule update --init --recursive

install into python under aiter root dir run: python3 setup.py develop

Usage

JIT compiler compiles the operator which is calling.

Modifications

  • Add block scale aiter moe in fused_moe_triton/fused_moe.py
  • Add weights shuffle in fused_moe_triton/layers

Checklist

  • [ ] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

BruceXcluding avatar Feb 22 '25 12:02 BruceXcluding