llm-serving topic
BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
get-beam
Run GPU inference and training jobs on serverless infrastructure that scales with you.
helix
♾️ Helix is a private GenAI stack for building AI agents with declarative pipelines, knowledge (RAG), API bindings, and first-class testing.
llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
happy_vllm
A REST API for vLLM, production ready
Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
llm-inference-solutions
A collection of all available inference solutions for the LLMs
Awesome-LLM-Productization
Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
pratical-llms
A collection of hand on notebook for LLMs practitioner