Ma Mingfei comments

Results 93 comments of


                                            Ma Mingfei

Add intel_amx backend for Radix Attention

put a note on intel_amx behavior * device set cpu, attn not set - use intel_amx if hardware supports it.

Add intel_amx backend for Radix Attention

@yanbing-j rebase as https://github.com/sgl-project/sglang/pull/6115 been landed.

Add intel_amx backend for Radix Attention

> Hi @yanbing-j @mingfeima Please consider using two separate PRs for this change: one for the sgl-kernel related changes and another for the integration of the Python part. After the...

Add intel_amx backend for Radix Attention

replaced with https://github.com/sgl-project/sglang/pull/6405 https://github.com/sgl-project/sglang/pull/6408

Decouple int4 weight with serialized format

@malfet Hi we have modified the int4 packed weight logic from gpt-fast and also from torch: https://github.com/pytorch/pytorch/pull/129940 could you please help review? @yanbing-j could you also help evaluate how much...

[CPU] add mamba fla kernels for Qwen3-next

move the all 3 kernels in the same file `sgl-kernel/csrc/cpu/mamba/fla.cpp` aka. we leave 2 files in /csrc/cpu/mamba, `conv.cpp` and `fla.cpp` (flash linear attention) refer to https://github.com/sgl-project/sglang/pull/12309

[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next

@Valentine233 how much does this kernel contribute in e2e benchmarks right now?

[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next

@Valentine233 need to update `https://github.com/sgl-project/sglang/blob/main/test/srt/run_suite.py#L493-L510` to make CI really launch the test.

[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next

@Valentine233 update this check util according this https://github.com/sgl-project/sglang/pull/12324#discussion_r2516428644

[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next

fix CI fails.