Jiaxin Shan
Jiaxin Shan
/cc @Xunzhuo please also help evaluate it
@sadath-12 sounds good. We will ask @Xunzhuo 's help on this issue
We need to fix these issues in v0.5.0
https://github.com/vllm-project/aibrix/pull/1698#discussion_r2462650519 good suggestion
https://github.com/vllm-project/aibrix/pull/1700#discussion_r2462669774
@zhengkezhou1 this is a great idea. that's would be helpful for performance related testing
I will move this issue to future release. currently, it can detect the role but doesn't support hierarchy
there's one community user asking this feature, he want to request to `p0d0` or `p1d1` rather than `p0d1`
https://github.com/vllm-project/aibrix/pull/1409 this is partially supported. However, it just consider the prefix cache hits but ignore the overall replica load. We need to fix it in follow up PR.
@jiangxiaobin96 for some scenarios, P & D are deployed on the same host due to lack of RDMA etc. it's better to route request within that group instead of choosing...