vllm topic

List vllm repositories

llm-server-docs

146
Stars
13
Forks
Watchers

Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech, and ComfyUI.

LMCache

4.8k
Stars
526
Forks
Watchers

Supercharge Your LLM with the Fastest KV Cache Layer

Comprehensive, scalable ML inference architecture using Amazon EKS, leveraging Graviton processors for cost-effective CPU-based inference and GPU instances for accelerated inference. Guidance provides...

tiny-llm

3.4k
Stars
230
Forks
3.4k
Watchers

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Chinese-MedQA-Qwen2

34
Stars
5
Forks
34
Watchers

基于Qwen2+SFT+DPO的医疗问答系统,项目中使用了LLaMA-Factory用于训练,fastllm和vllm用于推理,

gpt_server

235
Stars
21
Forks
235
Watchers

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

easy-model-deployer

74
Stars
19
Forks
74
Watchers

Deploy open-source LLMs on AWS in minutes — with OpenAI-compatible APIs and a powerful CLI/SDK toolkit.

Mooncake

4.3k
Stars
446
Forks
4.3k
Watchers

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

gpustack

4.1k
Stars
412
Forks
4.1k
Watchers

GPU cluster manager for optimized AI model deployment

GPTQModel

902
Stars
130
Forks
902
Watchers

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.