Li Hui comments

Results 42 comments of


                                            Li Hui

Modify metrics service endpoint

> Oh, this is one of the reasons why it was difficult to set up Prometheus+Grafana metrics collection. The documentation doesn't mention anywhere that a password is required for the...

Modify metrics service endpoint

Anything missing? @zhyncs

Hierarchical Caching for SGLang

DeepSeek MLA is not supported yet, and an error will be reported when starting the model: ``` File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1849, in run_scheduler_process scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)...

Hierarchical Caching for SGLang

> Thank you @lambert0312 for pointing out, yes, this feature is still under meta stage and currently only supported MHA and GQA style memory pool. I will keep you posted...

Hierarchical Caching for SGLang

> @lambert0312 just FYI, there is a PR from the community supporting MLA with hierarchical caching, which will be merged soon but feel free to check it out: #4009 @xiezhq-hermann...

Speculative decoding with lookahead

Any progress on this?

Support for reasoning_content in API

Good work! It works for me. :D

Support for reasoning_content in API

> In #4000 the current default behavior is `separate_reasoning=True`. I still think it's valuable to have the option for clients to request that reasoning _not_ be separated, though the main...

feat: Add chat template content like `<think>` to response

If want to force the prefix to be generated, is it more elegant to set a chat template? I personally think it is better than implementing it through code here,...

feat: Add chat template content like `<think>` to response

> In fact, all that needs to be done is to remove `\n` from the r1 chat template, without needing to modify the code. yes