Liangsheng Yin
                                            Liangsheng Yin
                                        
                                    @fisher75 1. 在多轮对话中一直拥有一张图片的视野,直接最开始放一张图片,然后利用fork和`+=`就可以了,我们会自动share这个图片的prefix。 2. 多张图片目前不支持。
@LuoKaiGSW Could you please provide more details about this error, such as the GPU, NVIDIA driver version, and which packages cause this error?
@lucasavila00 This bug seems to be an `outlines` bug, but they are updating their APIs a lot in the incoming `0.0.35` version, I will look at it when it releases.
@lucasavila00 Sorry to cause that, as there would be some unexpected bugs raised when running the server, we don't choose to crash the entire server but just print an error...
@tzjtatata Hi, this is the chat template for `llava-v1.6-34b` https://github.com/sgl-project/sglang/blob/ad1dd74673a2e918a39d869865c1830fb634d150/python/sglang/lang/chat_template.py#L224-L225 https://github.com/sgl-project/sglang/blob/ad1dd74673a2e918a39d869865c1830fb634d150/python/sglang/lang/chat_template.py#L120-L133 As for the bugs leading to difference outputs, I believe the results lie somewhere else. Could you please provide...
@tzjtatata 1. The chat template can be determinated by this: https://github.com/haotian-liu/LLaVA/blob/7440ec9ee37b0374c6b5548818e89878e38f3353/llava/serve/gradio_web_server.py#L166-L193 Can you try to register these chat templates and match function like this? https://github.com/sgl-project/sglang/blob/ad1dd74673a2e918a39d869865c1830fb634d150/python/sglang/lang/chat_template.py 2. Yes, these tokenizers are...
@qeternity Thanks for reporting this problem! We have also noticed the CPU is bound with Python's single thread limit. We will work further to optimize our performance. As for the...
@Chenghao-Jia The local `jsonschema` version I used is `4.21.1`, could you please try it again?