sglang icon indicating copy to clipboard operation
sglang copied to clipboard

SGLang is a fast serving framework for large language models and vision language models.

Results 722 sglang issues
Sort by recently updated
recently updated
newest added

Awesome project. We have a paper https://arxiv.org/abs/2310.14034 with really complicated KV caching that I would love to go back and implement in SGLang. I tried to get an example working...

collaboration

Is there anyway to truncate text based on tokens? I really like that as a user I don't need to think about tokens. But to save memory I would like...

good first issue

Hello, curious if we can already use sglang as a backend for NVIDIA's Triton Server. Amazing work with the library btw, love it!

high priority

Hey, when is planned the support for Metal backend?

Exllamav2 is an excellent quantization method that would allow to use big models in consumer (~24Gb GPUs) thanks to fractional quantization methods. Would this be in the cards?

Hi team, I am using `sglang` with a local finetuned model (`basemodel_id = cognitivecomputations/dolphin-2.2.1-mistral-7b`). And running inference in a for loop. GPU: 4090 batch_sz=1 tokens_in ~ 2000 tokens_out ~200 ```...

import `outlines` instead of copy codes.