v0.3.0 roadmap
🚀 Feature Description and Motivation
I create this issue to track the v0.3.0 items we like to work on. We actually have a milestone https://github.com/aibrix/aibrix/milestone/9 to track all issues but that's too many issues that user who does not work on this project might feel overwhelmed.
Let's create a list that user are interested in.
Features
- Cloud Native Architecture
- [ ] https://github.com/vllm-project/aibrix/issues/710
- [ ] https://github.com/vllm-project/aibrix/issues/846
- Routing
- [ ] https://github.com/vllm-project/aibrix/issues/647
- [ ] https://github.com/vllm-project/aibrix/issues/99
- [ ] https://github.com/vllm-project/aibrix/issues/606
- [ ] https://github.com/vllm-project/aibrix/issues/677
- [ ] https://github.com/vllm-project/aibrix/issues/681
- [ ] https://github.com/vllm-project/aibrix/issues/672
- [x] https://github.com/vllm-project/aibrix/issues/673
- Lora production use case support
- [ ] https://github.com/vllm-project/aibrix/issues/700
- [ ] https://github.com/vllm-project/aibrix/issues/49
- [ ] https://github.com/vllm-project/aibrix/issues/129
- [ ] https://github.com/vllm-project/aibrix/issues/363
- Model centric deployment
- [ ] https://github.com/vllm-project/aibrix/issues/302
- Batch workloads optimization
- [ ] https://github.com/vllm-project/aibrix/issues/182
- KV Cache
- [ ] upstream prefix cache & external kv cache store interface
- Multi-tenant support
- [ ] https://github.com/vllm-project/aibrix/issues/649
Stability (Bugs/Installation)
- [x] https://github.com/vllm-project/aibrix/issues/847
- [ ] Provide better guidance to run AIBrix on CPU/Public Cloud/ Lambda Clouds etc
- [ ] https://github.com/vllm-project/aibrix/issues/658
- [ ] https://github.com/vllm-project/aibrix/issues/845
- [ ] https://github.com/vllm-project/aibrix/issues/696
- [ ] https://github.com/vllm-project/aibrix/issues/684
- [ ] https://github.com/vllm-project/aibrix/issues/683
- [ ] https://github.com/vllm-project/aibrix/issues/651
- [ ] https://github.com/vllm-project/aibrix/issues/636
- [ ] https://github.com/vllm-project/aibrix/issues/593
- [ ] https://github.com/vllm-project/aibrix/issues/452
Benchmark and Performance
- [ ] Provide all benchmarks and setup guidance for performance reproduce
- [ ] https://github.com/vllm-project/aibrix/issues/726
- [x] https://github.com/vllm-project/aibrix/issues/722
- [ ] https://github.com/vllm-project/aibrix/issues/666
- [ ] https://github.com/vllm-project/aibrix/issues/643
- [ ] https://github.com/vllm-project/aibrix/issues/90
Docs
- [x] https://github.com/vllm-project/aibrix/issues/754
- [ ] https://github.com/vllm-project/aibrix/issues/732
- [ ] https://github.com/vllm-project/aibrix/issues/685
- [ ] https://github.com/vllm-project/aibrix/issues/644
CI/CD and Developer Productivity
- [ ] https://github.com/vllm-project/aibrix/issues/739
- [ ] https://github.com/vllm-project/aibrix/issues/417
- [ ] https://github.com/vllm-project/aibrix/issues/690
- [ ] https://github.com/vllm-project/aibrix/issues/734
- [ ] https://github.com/vllm-project/aibrix/issues/584
Use Case
Track the v0.3.0 release items
Proposed Solution
No response
I'm wondering whether we can deliver a stable version at some time, stable here means a workable state, less bugs, relatively complete documentation, good test coverages. We can make it a baseline and append more features on it with feature gates or flags to enable/disable it. I have this question is just because I saw we have a lot of inspiring features to be merged, have no idea what's the plan to evolve with them in the long term.
@kerthcet I agree that after v0.2.0, we will have a solid baseline of features, and ensuring production-grade quality should be our top priority. We can discuss this further and align on the next steps as you suggested. The future roadmap should balance new feature development with production readiness to maintain stability while continuing to evolve. We have some internal adoptions as well, we will try to surface those bugs or tricks at the same time.
Are you planning to support [Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes to simplify the deployment of multi-node inference? I recently discussed this with youkaichao@, and he believes it could be possible by implementing a new executor.
Some references: https://github.com/vllm-project/vllm/issues/11400
According to the offline talk with @Jeffwan before, I think aibrix leverages ray for fine-gained orchestration, like multi host serving, pd disaggregated serving, so maybe not a plane in the long term? Need @Jeffwan 's confirm. But definitely possible for vllm project.
Congrats on the launch guys!
@Jeffwan Awesome! Congrats on the open-source!
@gaocegege @kerthcet
We do see lots of users do not like ray in distributed serving due to the its overhead and debug-ability. Supporting cloud native way to run vLLM in multi-nodes would be beneficial. I think options should be given to users. We created https://github.com/vllm-project/vllm/issues/3902 earlier but didn't get chance to works on it, if people likes it and there's no one working on it yet, we will spend some efforts and also change to orchestration layer.
BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change.
BTW, P&D case orchestration will introduce the application router or local cluster scheduler (CLS in splitwise paper), it's not exact same as current multi-node way, if the paradigm can be finalized, the cloud native way sounds like a plan. If not, I think it still a potential problem because everytime the paradigm is changed, cloud native way need additional change.
We had some discussions in production stack about it too. https://github.com/vllm-project/production-stack/issues/7#issuecomment-2621768872 .
/cc @KuntaiDu
I think multi cluster unified routing, and multi cluster PodAutoscaler also need to support.
@ying2025 multiple cluster support would be in future release, along with other cloud GPU features. probably in v0.5.0. If you have urgent requirements, feel free to let me know.
Due to the limited bandwidth, I will remove these two items from v0.3.0 and move to v0.4.0
- Model centric deployment
- [ ] https://github.com/vllm-project/aibrix/issues/302
- Batch workloads optimization
- [ ] https://github.com/vllm-project/aibrix/issues/182
- [ ] https://github.com/vllm-project/aibrix/issues/90
- Lora production use case support
- [ ] https://github.com/vllm-project/aibrix/issues/700
- [ ] https://github.com/vllm-project/aibrix/issues/49
- [ ] https://github.com/vllm-project/aibrix/issues/129
- [ ] https://github.com/vllm-project/aibrix/issues/363
v0.3.0 will be published in early May, focusing more stability, performance improvement (routing etc), and enable the production grade kv cache pool.
Is there a draft of v0.4.0 roadmap available for preview @Jeffwan ?
@Venkat2811 not yet. I plan to start the v0.4.0 release process once v0.3.0 approaches release status, likely sometime in early May
AIBrix v0.3.0 has been officially released! https://github.com/vllm-project/aibrix/releases/tag/v0.3.0 I am closing this issue. We're now starting preparations for v0.4.0. If you have a feature wish list or suggestions, feel free to open a new GitHub issue for discussion or share your thoughts here. https://github.com/vllm-project/aibrix/issues/1098
Going forward, we plan to shorten the release cycle to around 1-1.5 month, with each release focused on a single core feature and scenarios. Stay tuned and contribute your ideas!