aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Implement exact Preble routing algorithm in AIBRix

Open Jeffwan opened this issue 10 months ago • 0 comments

🚀 Feature Description and Motivation

Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing.

The prefix-cache aware version we are implementing is a little bit different from Preble, while we also borrow some ideas of their metadata design.

  1. We use hash blocks or linked hash blocks rather than radix tree. Technically, we ignore the efficiency issues at this moment.
  2. vLLM already have the local "logical" tree instead, in that case, we do not need it anymore.
  3. We should consider the "load-aware" part in the new PR, Load cost calculation is kind of essential and should be used with prefix-cache aware routing together. While, they implements load-aware in 3 steps. a. historical load cost b. eviction cost c. current request cost. We can implement many load-aware strategies as we can, some existing like least-request, least-kv-cache etc all fits under this category and can be reused as well

Use Case

No response

Proposed Solution

No response

Jeffwan avatar Feb 11 '25 06:02 Jeffwan