llm-inference-solutions
llm-inference-solutions copied to clipboard
A collection of all available inference solutions for the LLMs
llm-inference-solutions
A collection of all available inference solutions for the LLMs
| Name | Org | Description |
|---|---|---|
| vllm | UC Berkeley | A high-throughput and memory-efficient inference and serving engine for LLMs |
| Text-Generation-Inference | Hugginface🤗 | Large Language Model Text Generation Inference |
| llm-engine | ScaleAI | Scale LLM Engine public repository |
| DeepSpeed | Microsoft | DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective |
| OpenLLM | BentoML | Operating LLMs in production |
| LLMDeploy | InternLM Team | LMDeploy is a toolkit for compressing, deploying, and serving LLM |
| FlexFlow | CMU,Stanford,UCSD | A distributed deep learning framework. |
| CTranslate2 | OpenNMT | Fast inference engine for Transformer models |
| Fastchat | lm-sys | An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. |
| Triton-Inference-Server | Nvidia | The Triton Inference Server provides an optimized cloud and edge inferencing solution. |
| Lepton.AI | lepton.ai | A Pythonic framework to simplify AI service building |
| ScaleLLM | Vectorch | A high-performance inference system for large language models, designed for production environments |
| Lorax | Predibase | Serve 100s of Fine-Tuned LLMs in Production for the Cost of 1 |
| TensorRT-LLM | Nvidia | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines |
| mistral.rs | mistral.rs | Blazingly fast LLM inference. |