vllm topic

List vllm repositories

trustgraph

804

Stars

65

Forks

804

Watchers

Eliminate hallucinations from your AI agents.

agent-framework

kvcached

682

Stars

67

Forks

682

Watchers

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

elastic-kvcache

gpu-mutiplexing

inference-engine

FlashTTS

557

Stars

72

Forks

557

Watchers

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

llamacpp-python

InferenceMAX

383

Stars

55

Forks

383

Watchers

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

olla

119

Stars

12

Forks

119

Watchers

High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.

stopwatch

45

Stars

4

Forks

45

Watchers

A tool for benchmarking LLMs on Modal

machine-learning

arks

43

Stars

5

Forks

43

Watchers

Arks is a cloud-native inference framework running on Kubernetes

cloudnative-services

blackbird

39

Stars

4

Forks

39

Watchers

A high-performance RDMA distributed file system for fast LLM Inference and GPU Training

distributed-cache

deepseek-v3-r1-deploy-and-benchmarks

17

Stars

3

Forks

17

Watchers

DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks

deepseek-r1-awq

deepseek-v3-awq

UltraRAG

2.2k

Stars

195

Forks

2.2k

Watchers

UltraRAG v2: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines