vllm topic
trustgraph
Eliminate hallucinations from your AI agents.
kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
FlashTTS
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
InferenceMAX
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS
olla
High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
stopwatch
A tool for benchmarking LLMs on Modal
arks
Arks is a cloud-native inference framework running on Kubernetes
blackbird
A high-performance RDMA distributed file system for fast LLM Inference and GPU Training
deepseek-v3-r1-deploy-and-benchmarks
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks