aibrix
aibrix copied to clipboard
Implement exact Preble routing algorithm in AIBRix
🚀 Feature Description and Motivation
Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing.
The prefix-cache aware version we are implementing is a little bit different from Preble, while we also borrow some ideas of their metadata design.
- We use hash blocks or linked hash blocks rather than radix tree. Technically, we ignore the efficiency issues at this moment.
- vLLM already have the local "logical" tree instead, in that case, we do not need it anymore.
- We should consider the "load-aware" part in the new PR, Load cost calculation is kind of essential and should be used with prefix-cache aware routing together. While, they implements
load-awarein 3 steps. a. historical load cost b. eviction cost c. current request cost. We can implement manyload-awarestrategies as we can, some existing like least-request, least-kv-cache etc all fits under this category and can be reused as well
Use Case
No response
Proposed Solution
No response