Support storm service roleset and roles in AIBrix routing cache
🚀 Feature Description and Motivation
Currently, we only support pod in router cache. Even for multiple node setups, we only fetch ray-head pod instead.
Now, we introduce storm services which support replica mode and pooling mode. In order to support P/D disaggregation case, we need to do some refactor on internal cache to support roles and roleset concepts.
Use Case
Support P/D disaggregation routing
Proposed Solution
No response
I will move this issue to future release. currently, it can detect the role but doesn't support hierarchy
there's one community user asking this feature, he want to request to p0d0 or p1d1 rather than p0d1
https://github.com/vllm-project/aibrix/pull/1409 this is partially supported. However, it just consider the prefix cache hits but ignore the overall replica load. We need to fix it in follow up PR.
Hello, I wonder know why need to put prefill and decode pods into same roleset here?
@jiangxiaobin96 for some scenarios, P & D are deployed on the same host due to lack of RDMA etc. it's better to route request within that group instead of choosing P or D random from the entire pool.
this has been supported now. We can close the issue