llm-serving topic
torchpipe
Serving Inside Pytorch
sglang
SGLang is a fast serving framework for large language models and vision language models.
swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Awesome-LLMs-ICLR-24
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Z1
[EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
embeddedllm
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
gpustack
GPU cluster manager for optimized AI model deployment
kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
blackbird
A high-performance RDMA distributed file system for fast LLM Inference and GPU Training