vllm topics

inference

8.8k

Stars

767

Forks

8.8k

Watchers

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready...

xorbitsai

artificial-intelligence

chatglm

chatglm2

deepseek

BricksLLM

880

Stars

59

Forks

Watchers

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...

bricks-cloud

ai

anthropic

azure

golang

ray_vllm_inference

49

Stars

4

Forks

Watchers

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

asprenger

inference

mlops

model-serving

pytorch

fastassert

28

Stars

0

Forks

Watchers

Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API...

phospho-app

docker

llm

llm-inference

outlines

DoRA

124

Stars

4

Forks

124

Watchers

Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

nbasyl

dora

lora

peft-fine-tuning-llm

vllm

llm-inference

69

Stars

17

Forks

Watchers

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...

OpenCSGs

deepspeed

llama-cpp

llm-inference

ray