llm-inference-solutions

A collection of all available inference solutions for the LLMs

Name	Org	Description
vllm	UC Berkeley	A high-throughput and memory-efficient inference and serving engine for LLMs
Text-Generation-Inference	Hugginface🤗	Large Language Model Text Generation Inference
llm-engine	ScaleAI	Scale LLM Engine public repository
DeepSpeed	Microsoft	DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective
OpenLLM	BentoML	Operating LLMs in production
LLMDeploy	InternLM Team	LMDeploy is a toolkit for compressing, deploying, and serving LLM
FlexFlow	CMU,Stanford,UCSD	A distributed deep learning framework.
CTranslate2	OpenNMT	Fast inference engine for Transformer models
Fastchat	lm-sys	An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Triton-Inference-Server	Nvidia	The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Lepton.AI	lepton.ai	A Pythonic framework to simplify AI service building
ScaleLLM	Vectorch	A high-performance inference system for large language models, designed for production environments
Lorax	Predibase	Serve 100s of Fine-Tuned LLMs in Production for the Cost of 1
TensorRT-LLM	Nvidia	TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines
mistral.rs	mistral.rs	Blazingly fast LLM inference.

llm-inference-solutions
llm-inference-solutions copied to clipboard