aibrix v0.3.0 roadmap

🚀 Feature Description and Motivation

I create this issue to track the v0.3.0 items we like to work on. We actually have a milestone https://github.com/aibrix/aibrix/milestone/9 to track all issues but that's too many issues that user who does not work on this project might feel overwhelmed.

Let's create a list that user are interested in.

Features

Cloud Native Architecture
- [ ] https://github.com/vllm-project/aibrix/issues/710
- [ ] https://github.com/vllm-project/aibrix/issues/846
Routing
- [ ] https://github.com/vllm-project/aibrix/issues/647
- [ ] https://github.com/vllm-project/aibrix/issues/99
- [ ] https://github.com/vllm-project/aibrix/issues/606
- [ ] https://github.com/vllm-project/aibrix/issues/677
- [ ] https://github.com/vllm-project/aibrix/issues/681
- [ ] https://github.com/vllm-project/aibrix/issues/672
- [x] https://github.com/vllm-project/aibrix/issues/673
Lora production use case support
- [ ] https://github.com/vllm-project/aibrix/issues/700
- [ ] https://github.com/vllm-project/aibrix/issues/49
- [ ] https://github.com/vllm-project/aibrix/issues/129
- [ ] https://github.com/vllm-project/aibrix/issues/363
Model centric deployment
- [ ] https://github.com/vllm-project/aibrix/issues/302
Batch workloads optimization
- [ ] https://github.com/vllm-project/aibrix/issues/182
KV Cache
- [ ] upstream prefix cache & external kv cache store interface
Multi-tenant support
- [ ] https://github.com/vllm-project/aibrix/issues/649

Stability (Bugs/Installation)

[x] https://github.com/vllm-project/aibrix/issues/847
[ ] Provide better guidance to run AIBrix on CPU/Public Cloud/ Lambda Clouds etc
[ ] https://github.com/vllm-project/aibrix/issues/658
[ ] https://github.com/vllm-project/aibrix/issues/845
[ ] https://github.com/vllm-project/aibrix/issues/696
[ ] https://github.com/vllm-project/aibrix/issues/684
[ ] https://github.com/vllm-project/aibrix/issues/683
[ ] https://github.com/vllm-project/aibrix/issues/651
[ ] https://github.com/vllm-project/aibrix/issues/636
[ ] https://github.com/vllm-project/aibrix/issues/593
[ ] https://github.com/vllm-project/aibrix/issues/452

Benchmark and Performance

[ ] Provide all benchmarks and setup guidance for performance reproduce
[ ] https://github.com/vllm-project/aibrix/issues/726
[x] https://github.com/vllm-project/aibrix/issues/722
[ ] https://github.com/vllm-project/aibrix/issues/666
[ ] https://github.com/vllm-project/aibrix/issues/643
[ ] https://github.com/vllm-project/aibrix/issues/90

Docs

[x] https://github.com/vllm-project/aibrix/issues/754
[ ] https://github.com/vllm-project/aibrix/issues/732
[ ] https://github.com/vllm-project/aibrix/issues/685
[ ] https://github.com/vllm-project/aibrix/issues/644

CI/CD and Developer Productivity

[ ] https://github.com/vllm-project/aibrix/issues/739
[ ] https://github.com/vllm-project/aibrix/issues/417
[ ] https://github.com/vllm-project/aibrix/issues/690
[ ] https://github.com/vllm-project/aibrix/issues/734
[ ] https://github.com/vllm-project/aibrix/issues/584

Use Case

Track the v0.3.0 release items

Proposed Solution

No response

Feb 18 '25 06:02 Jeffwan

I'm wondering whether we can deliver a stable version at some time, stable here means a workable state, less bugs, relatively complete documentation, good test coverages. We can make it a baseline and append more features on it with feature gates or flags to enable/disable it. I have this question is just because I saw we have a lot of inspiring features to be merged, have no idea what's the plan to evolve with them in the long term.

Feb 18 '25 07:02 kerthcet

@kerthcet I agree that after v0.2.0, we will have a solid baseline of features, and ensuring production-grade quality should be our top priority. We can discuss this further and align on the next steps as you suggested. The future roadmap should balance new feature development with production readiness to maintain stability while continuing to evolve. We have some internal adoptions as well, we will try to surface those bugs or tricks at the same time.

Feb 18 '25 17:02 Jeffwan

Are you planning to support [Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes to simplify the deployment of multi-node inference? I recently discussed this with youkaichao@, and he believes it could be possible by implementing a new executor.

Some references: https://github.com/vllm-project/vllm/issues/11400

Feb 20 '25 00:02 gaocegege

According to the offline talk with @Jeffwan before, I think aibrix leverages ray for fine-gained orchestration, like multi host serving, pd disaggregated serving, so maybe not a plane in the long term? Need @Jeffwan 's confirm. But definitely possible for vllm project.

Feb 21 '25 07:02 kerthcet

Congrats on the launch guys!

Feb 21 '25 17:02 robertgshaw2-redhat

@Jeffwan Awesome! Congrats on the open-source!

Feb 23 '25 06:02 Electronic-Waste

@gaocegege @kerthcet

We do see lots of users do not like ray in distributed serving due to the its overhead and debug-ability. Supporting cloud native way to run vLLM in multi-nodes would be beneficial. I think options should be given to users. We created https://github.com/vllm-project/vllm/issues/3902 earlier but didn't get chance to works on it, if people likes it and there's no one working on it yet, we will spend some efforts and also change to orchestration layer.

BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change.

Feb 24 '25 05:02 Jeffwan

BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change.

We had some discussions in production stack about it too. https://github.com/vllm-project/production-stack/issues/7#issuecomment-2621768872 .

/cc @KuntaiDu

Feb 24 '25 05:02 gaocegege

I think multi cluster unified routing, and multi cluster PodAutoscaler also need to support.

Mar 17 '25 11:03 ying2025

@ying2025 multiple cluster support would be in future release, along with other cloud GPU features. probably in v0.5.0. If you have urgent requirements, feel free to let me know.

Apr 25 '25 07:04 Jeffwan

Due to the limited bandwidth, I will remove these two items from v0.3.0 and move to v0.4.0

Model centric deployment
- [ ] https://github.com/vllm-project/aibrix/issues/302
Batch workloads optimization
- [ ] https://github.com/vllm-project/aibrix/issues/182
[ ] https://github.com/vllm-project/aibrix/issues/90
Lora production use case support
- [ ] https://github.com/vllm-project/aibrix/issues/700
- [ ] https://github.com/vllm-project/aibrix/issues/49
- [ ] https://github.com/vllm-project/aibrix/issues/129
- [ ] https://github.com/vllm-project/aibrix/issues/363

v0.3.0 will be published in early May, focusing more stability, performance improvement (routing etc), and enable the production grade kv cache pool.

Apr 25 '25 07:04 Jeffwan

Is there a draft of v0.4.0 roadmap available for preview @Jeffwan ?

Apr 28 '25 12:04 Venkat2811

@Venkat2811 not yet. I plan to start the v0.4.0 release process once v0.3.0 approaches release status, likely sometime in early May

Apr 28 '25 17:04 Jeffwan

AIBrix v0.3.0 has been officially released! https://github.com/vllm-project/aibrix/releases/tag/v0.3.0 I am closing this issue. We're now starting preparations for v0.4.0. If you have a feature wish list or suggestions, feel free to open a new GitHub issue for discussion or share your thoughts here. https://github.com/vllm-project/aibrix/issues/1098

Going forward, we plan to shorten the release cycle to around 1-1.5 month, with each release focused on a single core feature and scenarios. Stay tuned and contribute your ideas!

May 22 '25 00:05 Jeffwan