Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Applying the charts on gke has an error

In the documentation, I suggest to use create instead of `apply`. `create` will workaround the issue

Ensure aibrix_kvcache Torch version stays compatible with latest vLLM/SGLang releases

seems the key problem is the sglang and vllm torch compatibility is not always aligned.

[RFC]: Support for Multi-Tenant Model Deployments and Tenant-Aware Routing in AIBrix

@ModiCodeCraftsman I’ve reviewed the doc and overall it looks good. Just a few suggestions to ensure full compatibility: - Tenant ID should be optional. The key builder and related logic...

Implement model architect aware scheduling policies

It should be done in cold start manager or some other reusable component.

Support multi-node & autoscaling & routing together for models like Deepseek-R1

## Routing ![Image](https://github.com/user-attachments/assets/c4ff2a79-8e5f-4524-8ca1-4f7a141056ba) ![Image](https://github.com/user-attachments/assets/f45c64c2-bf3e-448f-b170-238bd953bf24) always hit the head --- Update: after running more tests. I notice this is not true. I did see it comes to other pods, but due...

Support multi-node & autoscaling & routing together for models like Deepseek-R1

## RayCluster Orchestration related 1. ray.io/overwrite-container-cmd -> RayCluster level 2. header & worker annotations has to be set separately, there's no propogation to different roles yet. RayClusterFleet spec.templates.metadata controls RayCluster...

Support multi-node & autoscaling & routing together for models like Deepseek-R1

### vLLM 0.7.3 problem ![Image](https://github.com/user-attachments/assets/fad97710-e2ff-45de-8cb8-7bde93d0fc85) hang for long time, I checked https://github.com/vllm-project/vllm/issues/13136 and decide to rebuild the image ``` FROM vllm/vllm-openai:v0.7.3 RUN pip3 install -U ray[default,adag]==2.40.0 --progress-bar off # important...

Support multi-node & autoscaling & routing together for models like Deepseek-R1

## RDMA setup From the nccl logs, we can see that cross-node communication is happening over RDMA, while intra-node transfers fall back to IPC (NVLink in this case). ('NCCL INFO...

Support multi-node & autoscaling & routing together for models like Deepseek-R1

@xieus it's specific to ray head.

Support multi-node & autoscaling & routing together for models like Deepseek-R1

## Autoscaling ![Image](https://github.com/user-attachments/assets/214e7978-32f5-4054-85b8-1cb9e47aa5c1) ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deepseek-r1-671b-56f9654bbb-mgdwd-head-lf5xg 1/1 Running 0 27m 192.168.0.74 192.168.0.51 deepseek-r1-671b-56f9654bbb-mgdwd-worker-group-worker-pb4hh 1/1 Running 0 27m 192.168.0.81 192.168.0.52 ```...