aibrix issues

[router] LSH based prefix cache aware router

6

### 🚀 Feature Description and Motivation Right now, we're using xxhash in https://github.com/aibrix/aibrix/pull/641 for our prefix cache-aware router. We might consider switching to a consistent hash + LSH-based approach, which...

gaocegege

area/gateway

priority/important-soon

imbalance issues found in least-of-request or any other policies.

7

### 🚀 Feature Description and Motivation This issue is found by @gangmuk Technically, 1. the gateway router fetches the vLLM pods every 50ms, and calculate the running/pending/swapped request and make...

Jeffwan

area/gateway

area/performance

The current e2e tests are flaky

8

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/1703e6ce-87e1-43ee-bcb8-3445e677e11c) ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No response_

Jeffwan

priority/critical-urgent

area/testing

Polish benchmark scripts for autoscaling and routing

### 🚀 Feature Description and Motivation We have some initial work here. https://github.com/aibrix/aibrix/tree/main/benchmarks in v0.1.0 testing. however, these scripts are not polished very well. Since we did lots of testing...

Jeffwan

priority/important-soon

area/benchmark

area/tools

Implement exact Preble routing algorithm in AIBRix

### 🚀 Feature Description and Motivation Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing. The prefix-cache aware version we are implementing is a little bit different from Preble,...

Jeffwan

area/gateway

kind/feature

Testing AIBrix on AWS EKS Cluster

1

### 🚀 Feature Description and Motivation In the past, we use volcano engine as the primary platform to test aibrix. Now, it's time to test against other public cloud providers....

Jeffwan

[RFC]: Load-aware pattern-based routing policy with profile support

### Summary Having access to the GPU profile used by the GPU optimizer, we propose to add a new routing policy that utilizes performance profiles per input/output token pattern to...

zhangjyr

kind/enhancement

priority/important-soon

area/heterogeneous

Consider to move grpc-ext-proc Server to Python Code Base

### 🚀 Feature Description and Motivation There is few cases for migrating the grpc-ext-proc server to a Python code base. This change is driven by two main factors that would...

Jeffwan

kind/enhancement

area/gateway

priority/important-longterm

[RFC] Deliver stable, feasible, and smooth output for GPU Optimizer

3

### 🚀 Feature Description and Motivation Based on the experiments conducted so far, we have identified the following issues that need to be addressed to ensure the GPU optimizer fully...

nwangfw

priority/critical-urgent

kind/feature

area/heterogeneous

jinja2.exceptions.UndefinedError: 'bos_token' is undefined

1

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/f00a5795-5ea3-4341-9491-d95d167ae40e) ### Steps to Reproduce deploy the models ``` vllm serve Qwen/Qwen2.5-Coder-7B-Instruct --enable-lora --lora-modules model-1=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-2=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-3=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-4=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c --max-lora-rank 64 ``` send the request ```...

Jeffwan

area/benchmark

aibrix
aibrix copied to clipboard

Metadata

[router] LSH based prefix cache aware router

imbalance issues found in least-of-request or any other policies.

The current e2e tests are flaky

Polish benchmark scripts for autoscaling and routing

Implement exact Preble routing algorithm in AIBRix

Testing AIBrix on AWS EKS Cluster

[RFC]: Load-aware pattern-based routing policy with profile support

Consider to move grpc-ext-proc Server to Python Code Base

[RFC] Deliver stable, feasible, and smooth output for GPU Optimizer

jinja2.exceptions.UndefinedError: 'bos_token' is undefined

← Metadata

Owner

Metadata

aibrix aibrix copied to clipboard

Metadata

← Metadata

Owner

Metadata

aibrix
aibrix copied to clipboard