llm-serving topic

List llm-serving repositories

ray

33.2k
Stars
5.6k
Forks
450
Watchers

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

mosec

746
Stars
51
Forks
Watchers

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

skypilot

6.6k
Stars
474
Forks
27
Watchers

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

OpenLLM

9.8k
Stars
626
Forks
49
Watchers

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

vllm

28.7k
Stars
4.3k
Forks
241
Watchers

A high-throughput and memory-efficient inference and serving engine for LLMs

sugarcane-ai

46
Stars
14
Forks
Watchers

npm like package ecosystem for Prompts 🤖

superduper

4.7k
Stars
450
Forks
Watchers

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting,...

ialacol

142
Stars
17
Forks
Watchers

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

friendli-client

40
Stars
8
Forks
Watchers

Friendli: the fastest serving engine for generative AI