Li Hui comments

Results 42 comments of


                                            Li Hui

[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)

> @lambert0312 let me double check on both platform today. What chips you used ? A800 thanks @yiakwy-xpu-ml-framework-team

[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)

> @zhaochenyang20 I have revert commit to [6b08bf5](https://github.com/sgl-project/sglang/commit/6b08bf538bf3a7c69b710dc2c6f160d3f129008d). Once review is done, we could rebase onto main branch to resolve conflicts. > > Please let me do rebase merge later...

[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)

@yiakwy-xpu-ml-framework-team Thanks for the reply. I will try it again according to the steps tomorrow. Logically speaking, it will be built using your patch.

[Bug] Qwen2 Eagle serving error

I start the service using the following command: ``` python3 -m sglang.launch_server --model-path /path/to/Qwen2.5-Coder-7B-Instruct --context-length 16384 --tp 1 --speculative-algorithm EAGLE --speculative-draft-model-path /path/to/EAGLE-Qwen2-7B-Instruct --mem-fraction-static 0.5 --cuda-graph-max-bs 8 --speculative-num-steps 5 --speculative-eagle-topk 8...

feat: mtp support dp-attention

After testing, the error is as follows: ``` Scheduler hit an exception: Traceback (most recent call last): File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 314, in __init__ self.capture() File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 405, in capture...

feat: mtp support dp-attention

> [lambert0312](https://github.com/lambert0312) The latest commit( #[5256543](https://github.com/sgl-project/sglang/pull/6081/commits/5256543d05493646d4faaa73606b1e9498ab6e2c) has fixed this bug. Thanks! @u4lr451 Great, it has been verified to work properly, but the speed is much slower than when dp-attention is...

Feature DeepSeek V3/R1 INT8 Quantization (block-wise)

@HandH1998 @laixinn Cannot support torch-compile? When I enable torch-compile, the returned result is garbled characters. Like this: ``` {"id":"2fe19ce57cdb4613bf5e1b718d21ae8b","object":"chat.completion","created":1740622831,"model":"ds3","choices":[{"index":0,"message":{"role":"assistant","content":"�-se-se goodπππ goodπ good goodππ goodπ goodπ goodππ good good goodπ good-seππ...