vllm topics

inference

2.9k

Stars

239

Forks

Watchers

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...

xorbitsai

artificial-intelligence

chatglm

chatglm2

deepseek

BricksLLM

768

Stars

48

Forks

Watchers

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...

bricks-cloud

ai

anthropic

azure

golang

ray_vllm_inference

31

Stars

4

Forks

Watchers

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

mlops

fastassert

26

Stars

0

Forks

Watchers

Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API...

llm

DoRA

108

Stars

2

Forks

Watchers

Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

nbasyl

dora

lora

peft-fine-tuning-llm

vllm

llm-inference

43

Stars

8

Forks

Watchers

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...

ray