sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Development Roadmap

Open Ying1123 opened this issue 1 year ago • 13 comments

Function Calling

  • Frontend
    • Add tools argument in sgl.gen. See also guidance tools
  • Backend
    • OpenAI: Translate to their function calling API (https://platform.openai.com/docs/guides/function-calling).
    • Local Models (SGLang)
      1. Use SGLang primitives (regex, select) and constrained decoding to implement a workflow
      2. Directly use models that support function calling (e.g., Gorilla OpenFunctions, https://huggingface.co/jondurbin/bagel-dpo-7b-v0.4#prompting-strategies)
    • Local Models (OpenAI-compatible API)

High-level Pythonic Interface

  • #39

Inference Optimizations

  • Speculative decoding for local models
  • Speculative execution for OpenAI Chat API
    • #48

Structured Decoding

  • Support parallel JSON decoding https://github.com/varunshenoy/super-json-mode/issues/8
  • Support auto parallel decoding https://arxiv.org/abs/2401.06761

Compiler

  • Support tracing and compiling sgl.fork
  • Support sending a full serialized SGL program to the server

LoRA Support

  • Port multi-LoRA batching and unified memory from S-LoRA

Model Coverage

  • Vision Langauge Models. Support top-performing models from https://github.com/open-compass/VLMEvalKit
  • Language Models. Port the implementation of popular models from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models. (help)

Device Coverage

  • AMD support. Investigate AMD support in Trion and FlashInfer.
  • CPU support. This is better done by adding a llama.cpp backend.

Ying1123 avatar Feb 07 '24 07:02 Ying1123

Are there still plans for a high level pythonic interface? https://github.com/sgl-project/sglang/issues/39#issuecomment-1899351565

AriMKatz avatar Feb 07 '24 16:02 AriMKatz

Are there still plans for a high level pythonic interface? #39 (comment)

Hi @AriMKatz, thanks for the reference. This is very important, I just added it.

Ying1123 avatar Feb 07 '24 19:02 Ying1123

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

nivibilla avatar Feb 08 '24 23:02 nivibilla

Are there plans for loading models in 8bit or 4bit?

aliencaocao avatar Feb 10 '24 06:02 aliencaocao

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

@nivibilla Yes, it is already aligned with the openai gpt4v API, see here. You can also find a runnable example of serving it with Sky Serve here.

Ying1123 avatar Feb 10 '24 11:02 Ying1123

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

Ying1123 avatar Feb 10 '24 12:02 Ying1123

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

I'm looking to load llava 1.6 in 8bit, but it does not seem that llava series has AWQ or GPTQ quants, or did I miss out anything here?

EDIT: I saw 1.5 has but not 1.6 yet. Perhaps its just too new and no one did a calibration yet.

aliencaocao avatar Feb 10 '24 12:02 aliencaocao

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

qeternity avatar Apr 01 '24 23:04 qeternity

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Ying1123 avatar Apr 02 '24 05:04 Ying1123

Tools support is very important, which is necessary for many use cases.

Bit0r avatar Apr 02 '24 11:04 Bit0r

Is TinyLlama supported? TinyLlama/TinyLlama-1.1B-Chat-v1.0 generation seems a bit slow...

omri-sap avatar Apr 04 '24 09:04 omri-sap

I see llama.cpp integration is on the roadmap. When will this feature be delivered? It would be very nice to have it , since it will support running local LLMs, such as llama models, on Mac computers and experiment them with the powerful and expressive SGLang.

wille-42 avatar May 06 '24 09:05 wille-42

I'd request to include support for Phi-3-mini

Gintasz avatar May 08 '24 16:05 Gintasz

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old. Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Hi which branch is it? looks like better start fresh

binarycrayon avatar Jun 24 '24 18:06 binarycrayon

I can help by getting rid of the vLLM in the dependencies.

zhyncs avatar Jul 16 '24 08:07 zhyncs

Moved to https://github.com/sgl-project/sglang/issues/634

Ying1123 avatar Jul 17 '24 02:07 Ying1123

Is the “tools” ok now?

CedricHwong avatar Nov 04 '24 12:11 CedricHwong