Li Hui
Li Hui
> Oh, this is one of the reasons why it was difficult to set up Prometheus+Grafana metrics collection. The documentation doesn't mention anywhere that a password is required for the...
Anything missing? @zhyncs
DeepSeek MLA is not supported yet, and an error will be reported when starting the model: ``` File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1849, in run_scheduler_process scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)...
> Thank you @lambert0312 for pointing out, yes, this feature is still under meta stage and currently only supported MHA and GQA style memory pool. I will keep you posted...
> @lambert0312 just FYI, there is a PR from the community supporting MLA with hierarchical caching, which will be merged soon but feel free to check it out: #4009 @xiezhq-hermann...
Any progress on this?
Good work! It works for me. :D
> In #4000 the current default behavior is `separate_reasoning=True`. I still think it's valuable to have the option for clients to request that reasoning _not_ be separated, though the main...
If want to force the prefix to be generated, is it more elegant to set a chat template? I personally think it is better than implementing it through code here,...
> In fact, all that needs to be done is to remove `\n` from the r1 chat template, without needing to modify the code. yes