sglang
sglang copied to clipboard
[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next
Motivation
This PR adds chunk_gated_delta_rule kernel for Qwen3-next.
Test Plan:
test/srt/cpu/test_mamba.py -k test_chunk_gated_delta_rule
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
- [ ] Format your code according to the Format code with pre-commit.
- [ ] Add unit tests according to the Run and add unit tests.
- [ ] Update documentation according to Write documentations.
- [ ] Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
[!WARNING] You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!
@Valentine233 how much does this kernel contribute in e2e benchmarks right now?
@Valentine233 how much does this kernel contribute in e2e benchmarks right now?
This kernel is about 13.67% of e2e, for Qwen3-Next prefill phase with BS=1, 1k length, TP=2 on GNR.
@Valentine233 need to update https://github.com/sgl-project/sglang/blob/main/test/srt/run_suite.py#L493-L510 to make CI really launch the test.
@Valentine233 update this check util according this https://github.com/sgl-project/sglang/pull/12324#discussion_r2516428644
fix CI fails.
fix CI fails.
@mingfeima The CI failures seem not related with this PR. I rebase several times, but the failures still exist.
@Valentine233 Hi, could you plz fix lint? I will help you merge this PR.
Thanks @FlamingoPg, the previous lint issue has been fixed. The current lint issue is not related to the PR: test/srt/test_priority_scheduling.py.
Hi @FlamingoPg, I have rebased again. There is no related CI issue now.