vllm topic
inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
BricksLLM
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
fastassert
Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API...
DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...
worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
happy_vllm
A REST API for vLLM, production ready
ICE-PIXIU
ICE-PIXIU:A Cross-Language Financial Megamodeling Framework