vllm topic
llm-server-docs
Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech, and ComfyUI.
LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks
Comprehensive, scalable ML inference architecture using Amazon EKS, leveraging Graviton processors for cost-effective CPU-based inference and GPU instances for accelerated inference. Guidance provides...
tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Chinese-MedQA-Qwen2
基于Qwen2+SFT+DPO的医疗问答系统,项目中使用了LLaMA-Factory用于训练,fastllm和vllm用于推理,
gpt_server
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
easy-model-deployer
Deploy open-source LLMs on AWS in minutes — with OpenAI-compatible APIs and a powerful CLI/SDK toolkit.
Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
gpustack
GPU cluster manager for optimized AI model deployment
GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.