aibrix
aibrix copied to clipboard
Support Mooncake P/D conductor algorithm in AIBrix router
🚀 Feature Description and Motivation
In mooncake's paper https://www.usenix.org/system/files/fast25-qin.pdf, chapter 4 talks about prefill and load aware scheduling algorithms, Let's put some efforts here to reproduce this paper and compare with AIBrix in-house algorithms.
I do think once we moved to KV Centric architectures, we should adjust the algorithm a little bit. Let's make it.
Use Case
Support new prefix-aware and load-aware policies.
Proposed Solution
No response