Jiaxin Shan
Jiaxin Shan
### Summary This RFC proposes making the API Gateway interface within AIBrix compatible with OpenAI. We met few issues in the past few days. and @gaocegege also suggest https://github.com/vllm-project/aibrix/issues/732 earlier...
### π Feature Description and Motivation AIBrix, which is composed of multiple controllers, currently lack of comprehensive monitoring makes it difficult to effectively manage and troubleshoot the system. We at...
### π Describe the bug This is a follow up issue of https://github.com/vllm-project/aibrix/issues/783 Seems some specific prompt data will fail the test ``` python benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code...
### π Describe the bug  ### Steps to Reproduce use deepseek-r1-local-volume version ### Expected behavior it should create the pods successfully ### Environment commit: 1f96e9a47aab42cf003ed4bd031701eea754332c24b6b10c691ee5a842eeceb1
### π Feature Description and Motivation We can consider to make code reusable and move to https://github.com/vllm-project/aibrix/tree/main/python/aibrix  This make the generator/benchmark code reusable. ### Use Case As a user,...
### π Feature Description and Motivation Weβre actively evolving AIBrix to support more advanced and production-ready LLM serving capabilities. For v0.4.0 and beyond, our roadmap includes: - Prefill & Decode...
### π Describe the bug  Can we list potential reasons for connection issues. In recently benchmark testing, I see a few similar cases. ### Steps to Reproduce Run benchmark,...
### Summary To further optimize large-scale LLM inference workloads, we plan to introduce support for Prefill/Decode (P/D) disaggregation in vLLM. This separation allows prefill and decode stages to run on...
### π Feature Description and Motivation  The image size is super large now, we need to reduce the size a little bit. ``` FROM ubuntu:22.04 RUN apt-get update &&...
### Summary The current KVCache offloading framework is built around assumptions from the vLLM v0 architecture. With the release of vLLM v1, which introduces new cache handling semantics, especially the...