xu-yfei

Results 9 comments of xu-yfei

> In _forward_flashmla_sparse(...), pad q’s head dimension to the required multiple (64 on SM90, 128 on SM100+) @YAMY1234 For the Hxx device, the padding head may exhibit poor performance. Could...

Some files were not included in the sglang wheel package during packaging.

``` --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -131,6 +131,9 @@ sglang = "sglang.cli.main:main" "srt/mem_cache/storage/hf3fs/hf3fs_utils.cpp", "srt/speculative/cpp_ngram/*.cpp", "srt/speculative/cpp_ngram/*.h", + "jit_kernel/include/sgl_kernel/*.h", + "jit_kernel/include/sgl_kernel/*.cuh", + "jit_kernel/csrc/*.cuh" ] ```

The performance results of MLA in early March. The latest performance may have changed, but the pattern should be similar. ``` flash infer mla deepseek flash mla kv_len batch h...

> Attempted to load this with AWQ, got this error: > > ``` > [2025-04-05 10:10:33 DP13 TP13] Scheduler hit an exception: Traceback (most recent call last): > File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py",...

> Could you add DP attention in the benchmarks? in 8*H20(96GB), weight mem usage=87.19 GB when `--dp-size 4 --enable-dp-attention`, not enough memory left

Could you provide the specific error in detail? > @xu-yfei Hi, I cannot compile your sgl-kernel even I merged #5000 into the branch `mla_dp`. Could you please help me fix...

> @xu-yfei > > For example: > > ``` > -- Generating done (0.0s) > -- Build files have been written to: /tmp/sglang/sgl-kernel/build > *** Building project with Ninja... >...

> @xu-yfei > > For example: > > ``` > -- Generating done (0.0s) > -- Build files have been written to: /tmp/sglang/sgl-kernel/build > *** Building project with Ninja... >...