Ma Mingfei

Results 93 comments of Ma Mingfei

put a note on intel_amx behavior * device set cpu, attn not set - use intel_amx if hardware supports it.

@yanbing-j rebase as https://github.com/sgl-project/sglang/pull/6115 been landed.

> Hi @yanbing-j @mingfeima Please consider using two separate PRs for this change: one for the sgl-kernel related changes and another for the integration of the Python part. After the...

replaced with https://github.com/sgl-project/sglang/pull/6405 https://github.com/sgl-project/sglang/pull/6408

@malfet Hi we have modified the int4 packed weight logic from gpt-fast and also from torch: https://github.com/pytorch/pytorch/pull/129940 could you please help review? @yanbing-j could you also help evaluate how much...

move the all 3 kernels in the same file `sgl-kernel/csrc/cpu/mamba/fla.cpp` aka. we leave 2 files in /csrc/cpu/mamba, `conv.cpp` and `fla.cpp` (flash linear attention) refer to https://github.com/sgl-project/sglang/pull/12309

@Valentine233 how much does this kernel contribute in e2e benchmarks right now?

@Valentine233 need to update `https://github.com/sgl-project/sglang/blob/main/test/srt/run_suite.py#L493-L510` to make CI really launch the test.

@Valentine233 update this check util according this https://github.com/sgl-project/sglang/pull/12324#discussion_r2516428644