Is Prefill-Decode disaggregation supported in AIbrix
🚀 Feature Description and Motivation
Hi experts.
the PD disaggregation is so popular nowadays. is it supported in AIbrix ? I didn't find the description of this feature.
Use Case
Multi-nodes AI inference for LLM models.
Proposed Solution
No response
I didn't find the description of PD disaggregation, too
https://aibrix.github.io/posts/2025-02-20-vllm-control-plane/
It seems that the Prefill & Decode (P&D) disaggregation is a future work.
“Building the Future of Scalable AI with AIBrix Moving forward, we plan to continue exploring the co-design approach by developing initiatives such as standardizing the KV Cache API for use with external KV pools in prefix cache scenarios, plugging AIBrix distributed KV cache pool for Prefill & Decode (P&D) disaggregation, considering roofline-based models to streamline profiling processes in heterogeneous routing, and enhancing distributed orchestration to better support large-scale models like DeepSeek R1 and various offline scenarios.”
@TianTengya Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I will keep you posted here.
thanks for the reply.
@TianTengya Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I will keep you posted here.
@TianTengya Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I will keep you posted here.
@Jeffwan Is it because the controller cannot yet perceive the topological structure of prefill and decode nodes that Prefill & Decode (P&D) disaggregation is not currently supported? What functions need to be developed to support P&D disaggregation?
@libin817927 We need to support a better routing solution to balance P and D traffic. otherwise, it's hard to show the benefits. In addition, there're two different P/D approaches, either offloading or P2P. it's not just bring it up but need to tune the performance for entry level users.
We already kick off the work and this is a top priority item in v0.4.0.
P/D orchestration and routing is supported. we can close this issue now. v0.4.0 will expose more samples and documentation