lullabies777 comments

Results 3 comments of


                                            lullabies777

[Bug] Why sglang is slower than vllm on ShareGPT datasets?

> Why vLLM `--enable-prefix-caching` but SGLang `--disable-radix-cache`? I added it because I noticed the comment ["Disable RadixAttention for prefix caching."](https://github.com/sgl-project/sglang/blob/55f5976b42d736f3dfe2f8f9b91a6536c212744a/python/sglang/srt/server_args.py) I experimented with both options, but the performance remained nearly...

[Bug] Why sglang is slower than vllm on ShareGPT datasets?

> ``` > # H100 SXM 80G > # both disable prefix cache > > {"backend": "sglang", "dataset_name": "sharegpt", "request_rate": 1, "total_input_tokens": 74403, "total_output_tokens": 61393, "total_output_tokens_retokenized": 53379, "mean_e2e_latency_ms": 1060.0129053120813, "median_e2e_latency_ms":...

vllm无法自动选择tool

> qwen2不太能支持这个vllm 0.6的原生tool call功能。但是我正在让下一版qwen2.5支持这个（希望顺利）。 > > 但是这应该不会影响vllm服务的正常启动（不要开vllm的tool call功能），启动vllm的方式见 https://github.com/QwenLM/Qwen2?tab=readme-ov-file#vllm （如果启动命令没问题、但是有影响，可以试试升级到最新的vllm或降级到0.5.x）我把vllm降到了0.5.5，貌似还是没有办法复现https://modelscope-agent.readthedocs.io/en/stable/llms/qwen2_tool_calling.html 这里面的教程，是还需要继续降级么。如果希望使用auto tool-choice是降级vllm就可以么，还是vllm本地部署就用不了auto-tool-choice？sglang支持auto-tool嘛，感谢回复！