Jiaxin Shan

Results 271 issues of Jiaxin Shan

### Summary This RFC proposes making the API Gateway interface within AIBrix compatible with OpenAI. We met few issues in the past few days. and @gaocegege also suggest https://github.com/vllm-project/aibrix/issues/732 earlier...

kind/enhancement
area/gateway
priority/critical-urgent

### πŸš€ Feature Description and Motivation AIBrix, which is composed of multiple controllers, currently lack of comprehensive monitoring makes it difficult to effectively manage and troubleshoot the system. We at...

priority/important-longterm
kind/feature
area/stability

### πŸ› Describe the bug This is a follow up issue of https://github.com/vllm-project/aibrix/issues/783 Seems some specific prompt data will fail the test ``` python benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code...

kind/bug
priority/important-soon
area/benchmark

### πŸ› Describe the bug ![Image](https://github.com/user-attachments/assets/75742fa7-37f6-4ca1-b03d-0de70f2632a1) ### Steps to Reproduce use deepseek-r1-local-volume version ### Expected behavior it should create the pods successfully ### Environment commit: 1f96e9a47aab42cf003ed4bd031701eea754332c24b6b10c691ee5a842eeceb1

kind/bug
area/distributed

### πŸš€ Feature Description and Motivation We can consider to make code reusable and move to https://github.com/vllm-project/aibrix/tree/main/python/aibrix ![Image](https://github.com/user-attachments/assets/06999554-6f59-48da-a43d-5ff1af9fbc71) This make the generator/benchmark code reusable. ### Use Case As a user,...

kind/feature
area/benchmark
area/performance

### πŸš€ Feature Description and Motivation We’re actively evolving AIBrix to support more advanced and production-ready LLM serving capabilities. For v0.4.0 and beyond, our roadmap includes: - Prefill & Decode...

kind/documentation
priority/important-soon

### πŸ› Describe the bug ![Image](https://github.com/user-attachments/assets/aca3c823-507f-4dde-9d04-0adee8f8c2e8) Can we list potential reasons for connection issues. In recently benchmark testing, I see a few similar cases. ### Steps to Reproduce Run benchmark,...

### Summary To further optimize large-scale LLM inference workloads, we plan to introduce support for Prefill/Decode (P/D) disaggregation in vLLM. This separation allows prefill and decode stages to run on...

kind/enhancement
area/gateway
priority/critical-urgent
area/disaggregated

### πŸš€ Feature Description and Motivation ![Image](https://github.com/user-attachments/assets/d4b2c067-ce70-4e62-97e1-775484ea8ac1) The image size is super large now, we need to reduce the size a little bit. ``` FROM ubuntu:22.04 RUN apt-get update &&...

help wanted
priority/important-longterm
area/cicd
area/installation
area/kv-cache

### Summary The current KVCache offloading framework is built around assumptions from the vLLM v0 architecture. With the release of vLLM v1, which introduces new cache handling semantics, especially the...

priority/important-soon
area/distributed
area/kv-cache