sglang topic
llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
gpt_server
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
gpustack
GPU cluster manager for optimized AI model deployment
MOSS-TTSD
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech ge...
GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
FlashTTS
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
InferenceMAX
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS