aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Cost-efficient and pluggable Infrastructure components for GenAI inference

Results 339 aibrix issues
Sort by recently updated
recently updated
newest added

### 🚀 Feature Description and Motivation Right now, we're using xxhash in https://github.com/aibrix/aibrix/pull/641 for our prefix cache-aware router. We might consider switching to a consistent hash + LSH-based approach, which...

area/gateway
priority/important-soon

### 🚀 Feature Description and Motivation This issue is found by @gangmuk Technically, 1. the gateway router fetches the vLLM pods every 50ms, and calculate the running/pending/swapped request and make...

area/gateway
area/performance

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/1703e6ce-87e1-43ee-bcb8-3445e677e11c) ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No response_

priority/critical-urgent
area/testing

### 🚀 Feature Description and Motivation We have some initial work here. https://github.com/aibrix/aibrix/tree/main/benchmarks in v0.1.0 testing. however, these scripts are not polished very well. Since we did lots of testing...

priority/important-soon
area/benchmark
area/tools

### 🚀 Feature Description and Motivation Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing. The prefix-cache aware version we are implementing is a little bit different from Preble,...

area/gateway
kind/feature

### 🚀 Feature Description and Motivation In the past, we use volcano engine as the primary platform to test aibrix. Now, it's time to test against other public cloud providers....

### Summary Having access to the GPU profile used by the GPU optimizer, we propose to add a new routing policy that utilizes performance profiles per input/output token pattern to...

kind/enhancement
priority/important-soon
area/heterogeneous

### 🚀 Feature Description and Motivation There is few cases for migrating the grpc-ext-proc server to a Python code base. This change is driven by two main factors that would...

kind/enhancement
area/gateway
priority/important-longterm

### 🚀 Feature Description and Motivation Based on the experiments conducted so far, we have identified the following issues that need to be addressed to ensure the GPU optimizer fully...

priority/critical-urgent
kind/feature
area/heterogeneous

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/f00a5795-5ea3-4341-9491-d95d167ae40e) ### Steps to Reproduce deploy the models ``` vllm serve Qwen/Qwen2.5-Coder-7B-Instruct --enable-lora --lora-modules model-1=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-2=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-3=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-4=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c --max-lora-rank 64 ``` send the request ```...

area/benchmark