llm-serving topic

List llm-serving repositories

sglang

20.3k
Stars
3.5k
Forks
20.3k
Watchers

SGLang is a fast serving framework for large language models and vision language models.

swiftLLM

85
Stars
6
Forks
Watchers

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Awesome-LLMs-ICLR-24

66
Stars
4
Forks
66
Watchers

It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.

Nanoflow

522
Stars
18
Forks
Watchers

A throughput-oriented high-performance serving framework for LLMs

Z1

66
Stars
2
Forks
66
Watchers

[EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"

embeddedllm

46
Stars
2
Forks
46
Watchers

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

gpustack

4.1k
Stars
412
Forks
4.1k
Watchers

GPU cluster manager for optimized AI model deployment

kvcached

682
Stars
67
Forks
682
Watchers

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

blackbird

39
Stars
4
Forks
39
Watchers

A high-performance RDMA distributed file system for fast LLM Inference and GPU Training