sglang icon indicating copy to clipboard operation
sglang copied to clipboard

SGLang is a fast serving framework for large language models and vision language models.

Results 722 sglang issues
Sort by recently updated
recently updated
newest added

When I am asking the model about an image, behaviour seems to break when I use the _choices_ functionality, and it seems to always suggest the first option. I think...

I am unable to create a Runtime with sglang as follows `runtime = sgl.Runtime(model_path=MODEL_DIR, tokenizer_path=MODEL_DIR)`. It throws the error below: ```python ImportError: cannot import name '_set_default_torch_dtype' from 'vllm.model_executor.model_loader' (/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py) ```...

There has been a number of times, where the float type will get an unlimited amount of zeros being generated. Any idea what could be the cause? I am thinking...

I just use this command to start the server `CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --model-path LLMs/Qwen-14B-Chat --port 30000 --trust-remote-code --stream-interval 1 --enable-flashinfer --schedule-conservativeness 50` and using the following code to test...

how does radix-attention function call need to be modified in sglang for a model implemented in vllm where paged attention takes care of multi-query and grouped-query architecture

For a lot of use cases, there is already a pre-defined system + base prompt that is used. Can we define the KV cache for these prompts up front manually?...

I loaded Llava v1.6 34B on my server ``` export DISABLE_NEST_ASYNCIO=True model=liuhaotian/llava-v1.6-34b tokenizer=liuhaotian/llava-v1.6-34b-tokenizer CUDA_VISIBLE_DEVICES=0,1 python3 -m sglang.launch_server --model-path $model --tokenizer-path $tokenizer --port 30813 --tp 2 ``` It works when I...

Hi folks, Where is the code for benchmark of first token time? I only see the average latency :) Thanks,

After doing QLoRA with a training library (unsloth) and saving the adapter, is there a way to load the 4 bit bnb model and the un-merged adapter for use with...

Can it support the InternVL multimodal large model, which currently ranks first in the MMMU open source ranking. [https://github.com/OpenGVLab/InternVL/](https://github.com/OpenGVLab/InternVL/) ![WX20240324-102942@2x](https://github.com/sgl-project/sglang/assets/4583537/2416f85d-5231-4d8c-9255-b598385e6eaa) [MMMU](https://mmmu-benchmark.github.io)