Jiaxin Shan
Jiaxin Shan
In the documentation, I suggest to use create instead of `apply`. `create` will workaround the issue
seems the key problem is the sglang and vllm torch compatibility is not always aligned.
@ModiCodeCraftsman I’ve reviewed the doc and overall it looks good. Just a few suggestions to ensure full compatibility: - Tenant ID should be optional. The key builder and related logic...
It should be done in cold start manager or some other reusable component.
## Routing   always hit the head --- Update: after running more tests. I notice this is not true. I did see it comes to other pods, but due...
## RayCluster Orchestration related 1. ray.io/overwrite-container-cmd -> RayCluster level 2. header & worker annotations has to be set separately, there's no propogation to different roles yet. RayClusterFleet spec.templates.metadata controls RayCluster...
### vLLM 0.7.3 problem  hang for long time, I checked https://github.com/vllm-project/vllm/issues/13136 and decide to rebuild the image ``` FROM vllm/vllm-openai:v0.7.3 RUN pip3 install -U ray[default,adag]==2.40.0 --progress-bar off # important...
## RDMA setup From the nccl logs, we can see that cross-node communication is happening over RDMA, while intra-node transfers fall back to IPC (NVLink in this case). ('NCCL INFO...
@xieus it's specific to ray head.
## Autoscaling  ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deepseek-r1-671b-56f9654bbb-mgdwd-head-lf5xg 1/1 Running 0 27m 192.168.0.74 192.168.0.51 deepseek-r1-671b-56f9654bbb-mgdwd-worker-group-worker-pb4hh 1/1 Running 0 27m 192.168.0.81 192.168.0.52 ```...