aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Is Prefill-Decode disaggregation supported in AIbrix

Open William12github opened this issue 8 months ago • 7 comments

🚀 Feature Description and Motivation

Hi experts.

the PD disaggregation is so popular nowadays. is it supported in AIbrix ? I didn't find the description of this feature.

Use Case

Multi-nodes AI inference for LLM models.

Proposed Solution

No response

William12github avatar Apr 09 '25 10:04 William12github

I didn't find the description of PD disaggregation, too

TianTengya avatar Apr 22 '25 07:04 TianTengya

https://aibrix.github.io/posts/2025-02-20-vllm-control-plane/

It seems that the Prefill & Decode (P&D) disaggregation is a future work.

“Building the Future of Scalable AI with AIBrix Moving forward, we plan to continue exploring the co-design approach by developing initiatives such as standardizing the KV Cache API for use with external KV pools in prefix cache scenarios, plugging AIBrix distributed KV cache pool for Prefill & Decode (P&D) disaggregation, considering roofline-based models to streamline profiling processes in heterogeneous routing, and enhancing distributed orchestration to better support large-scale models like DeepSeek R1 and various offline scenarios.”

TianTengya avatar Apr 22 '25 11:04 TianTengya

@TianTengya Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I will keep you posted here.

Jeffwan avatar Apr 25 '25 06:04 Jeffwan

thanks for the reply.

William12github avatar Apr 25 '25 06:04 William12github

@TianTengya Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I will keep you posted here.

@TianTengya Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I will keep you posted here.

@Jeffwan Is it because the controller cannot yet perceive the topological structure of prefill and decode nodes that Prefill & Decode (P&D) disaggregation is not currently supported? What functions need to be developed to support P&D disaggregation?

libin817927 avatar May 16 '25 08:05 libin817927

@libin817927 We need to support a better routing solution to balance P and D traffic. otherwise, it's hard to show the benefits. In addition, there're two different P/D approaches, either offloading or P2P. it's not just bring it up but need to tune the performance for entry level users.

Jeffwan avatar May 23 '25 21:05 Jeffwan

We already kick off the work and this is a top priority item in v0.4.0.

Jeffwan avatar May 23 '25 21:05 Jeffwan

P/D orchestration and routing is supported. we can close this issue now. v0.4.0 will expose more samples and documentation

Jeffwan avatar Aug 01 '25 17:08 Jeffwan