Liangsheng Yin comments

Results 39 comments of


                                            Liangsheng Yin

Contradictory suggestions: Not enough memory. Please try to increase --mem-fraction-static

There are three types of memory in SGLang: 1. memory for model weights. 2. memory for KV cache. 3. temporary memory for intermediate computing results. The answer to your question...

Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging

@alessiodallapiazza We are welcome if you can submit a PR to add this feature.

AttributeError: 'str' object has no attribute 'eos_token_id'

It's due to `outlines` API changes, please downgrade `outlines

Prefill out of memory occur when deployed with servers

@for-just-we Could you please checkout if there is a KV cache leak during the inference or some other runtime errors?

`RecursionError: maximum recursion depth exceeded while calling a Python object` when inferencing with long input

@Ja1Zhou Of course, this logic can be implemented without recursion. I am unsure whether there would be so many nodes in a single path in the radix tree; it's very...

Does sglang support multi-node backend model?

@Luodian 1. We don't support multi-node serving currently; it will be supported in the future. 2. Sorry to cause confusion between **tensor parallelism** and **frontend parallel**. The `parellel=8` means using...

Does sglang support multi-node backend model?

@koalazf99 Yes, the `--tp-size` stands for tensor parallelism, which allows your server to run across multiple GPUs. This is the only configuration required to enable tensor parallelism. However, note that...

Does sglang support multi-node backend model?

@koalazf99 Yes, the data parallelism is not supported yet.

[Questions] In-Context-Learning for Batch Inference 上下文学习怎么批量推理？

@fisher75 请问你具体举的例子是哪一个，如果你想说的是在vision model（比如llava）中给定few shot学习的对象是图片并且利用这个作为sharing的context prefix，后面又添加另外的图片来推理的话，现在这些vision model应该是不支持的。因为现在只支持一张图片作为输入。

[Questions] In-Context-Learning for Batch Inference 上下文学习怎么批量推理？

@fisher75 你可以参考这个tree_of_thought的benchmark https://github.com/sgl-project/sglang/blob/cb389c91bcff6ffac4a95a0551a05d67e21ba306/benchmark/tree_of_thought_deep/bench_sglang.py#L41-L70 image的API直接用 https://github.com/sgl-project/sglang/blob/cb389c91bcff6ffac4a95a0551a05d67e21ba306/examples/quick_start/srt_example_llava.py#L7-L10