Jiaxin Shan

Results 271 issues of Jiaxin Shan

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/e2c652ab-c3e8-44c1-92dc-ca0581064b6c) ### Steps to Reproduce N/A ### Expected behavior N/A ### Environment N/A

priority/important-soon
area/cicd

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/9a65f1b0-e39b-45ae-bd4d-c8762a4e3cbc) ``` Warning FailedMount 25s (x19 over 37m) kubelet (combined from similar events): MountVolume.SetUp failed for volume "kube-api-access-wfmh2" : write /var/lib/kubelet/pods/26af5e70-f9b9-4c12-a9de-dcf2d4e93689/volumes/kubernetes.io~projected/kube-api-access-wfmh2/..2025_02_18_00_11_37.99831085/namespace: no space left on...

kind/bug
priority/critical-urgent
area/kv-cache

### 🚀 Feature Description and Motivation Copy some TODO items from issue https://github.com/aibrix/aibrix/pull/650 WIP items: - [ ] Improving aibrix/benchmarks/generator/client.py with async and streaming mode - [ ] TTFT -...

priority/critical-urgent
area/benchmark
area/performance

### 🚀 Feature Description and Motivation This issue is found by @gangmuk Technically, 1. the gateway router fetches the vLLM pods every 50ms, and calculate the running/pending/swapped request and make...

area/gateway
area/performance

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/1703e6ce-87e1-43ee-bcb8-3445e677e11c) ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No response_

priority/critical-urgent
area/testing

### 🚀 Feature Description and Motivation We have some initial work here. https://github.com/aibrix/aibrix/tree/main/benchmarks in v0.1.0 testing. however, these scripts are not polished very well. Since we did lots of testing...

priority/important-soon
area/benchmark
area/tools

### 🚀 Feature Description and Motivation Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing. The prefix-cache aware version we are implementing is a little bit different from Preble,...

area/gateway
kind/feature

### 🚀 Feature Description and Motivation In the past, we use volcano engine as the primary platform to test aibrix. Now, it's time to test against other public cloud providers....

### 🚀 Feature Description and Motivation There is few cases for migrating the grpc-ext-proc server to a Python code base. This change is driven by two main factors that would...

kind/enhancement
area/gateway
priority/important-longterm

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/f00a5795-5ea3-4341-9491-d95d167ae40e) ### Steps to Reproduce deploy the models ``` vllm serve Qwen/Qwen2.5-Coder-7B-Instruct --enable-lora --lora-modules model-1=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-2=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-3=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c model-4=VERSIL91/10627788-942b-4b44-b5f5-167c4b543f2c --max-lora-rank 64 ``` send the request ```...

area/benchmark