Support multi-node & autoscaling & routing together for models like Deepseek-R1
🚀 Feature Description and Motivation
Orchestration
- Deepseek-r1 full weights needs to be deployed using multi-node orchestration. If we adopt cross node TP, Let's make sure we unblock RDMA communication in such case..
- Let's make sure the rolling upgrade experiences are expected.
- We also need graceful shutdown to make sure in-flight request can be handled correctly.
Autoscaling
In such cases, traditional autoscaling may not work well.
- For resource metrics like SM_ACTIVE etc, it is still aggregated at the pod level and make no big differences.
- For applications metrics, only head pod which has the apiserver deployed emit the metrics. it has to be consistent with the number of the units.
Routing
- Router should skip some worker pods and only consider head pod for request touring
- Make sure it remove the pod when it comes into terminating stage.
Use Case
As a user, I want to host deepseek-r1 full weights version and autoscale the workloads based on the traffic
Proposed Solution
No response
Routing
always hit the head
Update: after running more tests. I notice this is not true. I did see it comes to other pods, but due to some issues, the request didn't run through.
python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200 --random-input-len 2048 --random-output-len 200 --dataset-name random --ignore-eos
RayCluster Orchestration related
- ray.io/overwrite-container-cmd -> RayCluster level
- header & worker annotations has to be set separately, there's no propogation to different roles yet. RayClusterFleet spec.templates.metadata controls RayCluster metadata.
- Probe can be overrided by users. or disable injection
vLLM 0.7.3 problem
hang for long time, I checked https://github.com/vllm-project/vllm/issues/13136 and decide to rebuild the image
FROM vllm/vllm-openai:v0.7.3
RUN pip3 install -U ray[default,adag]==2.40.0 --progress-bar off # important for future healthcheck
RUN pip3 install -U nvidia-nccl-cu12 --progress-bar off
ENTRYPOINT [""]
Note: in 0.7.3,
ray[adag]was used to replaceray[default]. this bring issues to kuberay based deployment because our injected prob uses agent to check healthy status. I considered to use v0.7.2 but notice 0.7.3 brings flashattentionv3 for MLA optimization, so I just stick to v0.7.3
RDMA setup
From the nccl logs, we can see that cross-node communication is happening over RDMA, while intra-node transfers fall back to IPC (NVLink in this case). ('NCCL INFO NVLS multicast support is available')
RDMA(RoCE) logs
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Bootstrap: Using eth0:192.168.0.90<0>
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO cudaDriverVersion 12020
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL version 2.25.1+cuda12.2
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_IB_HCA set to mlx5_
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [4]mlx5_5:1/RoCE [5]mlx5_6:1/RoCE [6]mlx5_7:1/RoCE [7]mlx5_8:1/RoCE [RO]; OOB eth0:192.168.0.90<0>
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Using network IB
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO ncclCommInitRank comm 0xc764960 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId e000 commId 0xd0f99dd1affac83 - Init START
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO RAS client listening socket at ::1<28028>
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Bootstrap timings total 0.090936 (create 0.000030, send 0.000074, recv 0.000036, ring 0.030250, delay 0.000001)
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS multicast support is available on dev 0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO comm 0xc764960 rank 0 nRanks 16 nNodes 2 localRanks 8 localRank 0 MNNVL 0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 0: 0 8
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 1: 1 9
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 2: 2 10
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 3: 3 11
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 4: 4 12
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 5: 5 13
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 6: 6 14
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS Head 7: 7 15
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/16 : 0 7 6 5 4 3 2 1 9 10 11 12 13 14 15 8
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/16 : 0 8 15 14 13 12 11 10 9 1 2 3 4 5 6 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/16 : 0 7 6 5 4 3 11 12 13 14 15 8 9 10 2 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/16 : 0 1 2 10 9 8 15 14 13 12 11 3 4 5 6 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/16 : 0 7 6 5 13 14 15 8 9 10 11 12 4 3 2 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/16 : 0 1 2 3 4 12 11 10 9 8 15 14 13 5 6 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/16 : 0 7 15 8 9 10 11 12 13 14 6 5 4 3 2 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/16 : 0 1 2 3 4 5 6 14 13 12 11 10 9 8 15 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/16 : 0 7 6 5 4 3 2 1 9 10 11 12 13 14 15 8
dee[36m(RayWorkerWrapper pid=996)[0m INFO 03-02 10:21:47 utils.py:916] Found nccl from library libnccl.so.2
[36m(RayWorkerWrapper pid=996)[0m INFO 03-02 10:21:47 pynccl.py:69] vLLM is using nccl==2.25.1
[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)[0m INFO 03-02 10:21:42 __init__.py:207] Automatically detected platform cuda.[32m [repeated 7x across cluster][0m
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO cudaDriverVersion 12020
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Bootstrap: Using eth0:192.168.0.83<0>
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL version 2.25.1+cuda12.2
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin.
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_IB_HCA set to mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [4]mlx5_5:1/RoCE [5]mlx5_6:1/RoCE [6]mlx5_7:1/RoCE [7]mlx5_8:1/RoCE [RO]; OOB eth0:192.168.0.83<0>
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Using network IB
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO ncclCommInitRank comm 0xde1dae0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 44000 commId 0xd0f99dd1affac83 - Init START
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO RAS client listening socket at ::1<28028>
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Bootstrap timings total 0.006130 (create 0.000024, send 0.000165, recv 0.000208, ring 0.001345, delay 0.000000)
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,00000000,0000ffff,ffffffff
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO NVLS multicast support is available on dev 1
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO comm 0xde1dae0 rank 9 nRanks 16 nNodes 2 localRanks 8 localRank 1 MNNVL 0
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/-1/-1->9->1 [2] -1/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/-1/-1->9->8 [5] 10/-1/-1->9->8 [6] 10/-1/-1->9->8 [7] 10/-1/-1->9->8 [8] 10/-1/-1->9->8 [9] 10/1/-1->9->-1 [10] -1/-1/-1->9->8 [11] 10/-1/-1->9->8 [12] 10/-1/-1->9->8 [13] 10/-1/-1->9->8 [14] 10/-1/-1->9->8 [15] 10/-1/-1->9->8
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO P2P Chunksize set to 131072
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:1377 [1] NCCL INFO [Proxy Service] Device 1 CPU core 40
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:1381 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 41
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/IPC
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/IPC
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/IPC
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/IPC
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 08/0 : 9[1] -> 10[2] via P2P/IPC
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Channel 10/0 : 9[1] -> 10[2] via P2P/IPC
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worke
[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-w
[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)[0m d
[36m(RayWorkerWrapper pid=341, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:341:341 [6] NCCL INFO Channel 12/0 : 14[6] -> 15
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 0: 0 8
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 1: 1 9
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 2: 2 10
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 3: 3 11
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 4: 4 12
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 5: 5 13
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 6: 6 14
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS Head 7: 7 15
[36m(RayWorkerWrapper pid=996)[0m deep
[36m(RayWorkerWrapper pid=1015)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:21258 [7] NCCL INFO [Proxy Progress] Device 7 CPU core 93
[36m(RayWorkerWrapper pid=1015)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:1015 [7] NCCL INFO Channel 07/0 : 15[7] -> 7[7] [receive] via NET/IB/15/GDRDMA
[36m(RayWorkerWrapper pid=1015)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:1015 [7] NCCL INFO Channel 15/0 : 15[7] -> 7[7] [receive] via NET/IB/1
[36m(RayWorkerWrapper pid=983)[0m deeps
[36m(RayWorkerWrapper pid=1005)[0m deepseek-r1-671b-88957849-q6slh
[36m(RayWorkerWrapper pid=987)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:987:987 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] v
[36m(RayWorkerWrapper pid=337, ip=192.168.0.83)[0m deepseek-r1-67
[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:340:340 [7] NCCL INFO Channel 07/0 : 15[7] -> 7[7] [send] via NET/IB/15/GDRDMApseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/16 : 0 8 15 14 13 12 11 10 9 1 2 3 4 5 6 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/16 : 0 7 6 5 4 3 11 12 13 14 15 8 9 10 2 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/16 : 0 1 2 10 9 8 15 14 13 12 11 3 4 5 6 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/16 : 0 7 6 5 13 14 15 8 9 10 11 12 4 3 2 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/16 : 0 1 2 3 4 12 11 10 9 8 15 14 13 5 6 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/16 : 0 7 15 8 9 10 11 12 13 14 6 5 4 3 2 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/16 : 0 1 2 3 4 5 6 14 13 12 11 10 9 8 15 7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/-1/-1->0->8 [9] -1/-1/-1->0->7 [10] 1/-1/-1->0->7 [11] 1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->7
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO P2P Chunksize set to 131072
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Check P2P Type intraNodeP2pSupport 1 directMode 0
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:21242 [0] NCCL INFO [Proxy Service] Device 0 CPU core 31
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:21249 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 32
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:21256 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 129
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/IPC
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Connected all trees
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO NVLS comm 0xc764960 headRank 0 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 01/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 09/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 8[0] -> 0[0] [receive] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 02/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 03/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 05/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 06/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 07/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 10/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 11/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 12/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 13/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 14/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Channel 15/0 : 0[0] -> 8[0] [send] via NET/IB/8/GDRDMA
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Connected NVLS tree
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO 16 coll channels, 16 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO CC Off, workFifoBytes 1048576
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin.
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO ncclCommInitRank comm 0xc764960 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId e000 commId 0xd0f99dd1affac83 - Init COMPLETE
deepseek-r1-671b-88957849-q6slh-head-fwl2w:734:734 [0] NCCL INFO Init timings - ncclCommInitRank: rank 0 nranks 16 total 3.08 (kernels 0.36, alloc 0.89, bootstrap 0.09, allgathers 0.01, topo 0.53, graphs 0.01, connections 1.18, rest 0.00)
[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:340:340 [7] NCCL INFO Channel 15/0 : 15[7] -> 7[7] [send] via NET/IB/15/GDRDMA
[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhsk
[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:338:338 [3] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:338:338 [3] NCCL INFO Connected all t
[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)[0m 6] via P2P/IPC
[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:342:342 [5] NCCL IN
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Connected all trees
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO
[36m(RayWorkerWrapper pid=340, ip=192.168.0.83)[0m deepse
[36m(RayWorkerWrapper pid=996)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:996:996 [3] NCCL INFO NVLS comm 0xbb0e900 headRank 3 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
[36m(RayWorkerWrapper pid=981)[0m deepseek-r1-671b-88957
[36m(RayWorkerWrapper pid=993)[0m deepseek-r1-671b-88957849-q6slh-hea
[36m(RayWorkerWrapper pid=1015)[0m 5/GDRDMA
[36m(RayWorkerWrapper pid=1015)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:1015:1015 [7] NCCL INFO Channel 01/0 : 15[7] -> 7[7] [re
[36m(RayWorkerWrapper pid=983)[0m deepseek-r1-671b-
[36m(RayWorkerWrapper pid=1005)[0m deepseek-r1-6
[36m(RayWorkerWrapper pid=987)[0m ia P2P/IPC
[36m(RayWorkerWrapper pid=987)[0m deepseek-r1-671b-88957849-q6slh-head-fwl2w:987:987 [5] NCCL INFO Channel 02/0 : 13[5] -> 5[5] [receive] via N
[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)[0m de
[36m(RayWorkerWrapper pid=341, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-wo
[36m(RayWorkerWrapper pid=339, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-wor
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m NVLS comm 0xde1dae0 headRank 1 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Connected NVLS tree
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO 16 coll channels, 16 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin.
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO ncclCommInitRank comm 0xde1dae0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 busId 44000 commId 0xd0f99dd1affac83 - Init COMPLETE
[36m(RayWorkerWrapper pid=336, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:336:336 [1] NCCL INFO Init timings - ncclCommInitRank: rank 9 nranks 16 total 2.94 (kernels 0.29, alloc 1.03, bootstrap 0.01, allgathers 0.01, topo 0.54, graphs 0.01, connections 1.06, rest 0.00)
[36m(RayWorkerWrapper pid=337, ip=192.168.0.83)[0m Channel 00/0 : 2[2] -> 10[2] [receive] via NET/IB/10/GDRDMA
[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)[0m deepseek-r1-671b-88957849-q6slh-worker-group-worker-hhskt:335:335 [0] NCCL INFO ncclCommInitRank comm 0xcf3b380 r
[36m(RayWorkerWrapper pid=338, ip=192.168.0.83)[0m rees
[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)[0m FO Connected all trees
WARNING 03-02 10:21:50 custom_all_reduce.py:84] Custom allreduce is disabled because this process group spans across nodes.
INFO 03-02 10:21:50 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='192.168.0.90', local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_1ee0df8a'), local_subscribe_port=60107, remote_subscribe_port=49929)
[36m(RayWorkerWrapper pid=996)[0m WARNING 03-02 10:21:50 custom_all_reduce.py:84] Custom allreduce is disabled because this process group spans across nodes.
[36m(RayWorkerWrapper pid=1015)[0m ceive] via NET/IB/15/GDRDMA
[36m(RayWorkerWrapper pid=987)[0m ET/IB/13/GDRDMA
[36m(RayWorkerWrapper pid=342, ip=192.168.0.83)[0m INFO 03-02 10:21:44 cuda.py:160] Using Triton MLA backend.[32m [repeated 14x across cluster][0m
[36m(RayWorkerWrapper pid=335, ip=192.168.0.83)[0m ank 8 nranks 16 cudaDev 0 nvmlDev 0 busId e000 commId 0xd0f99dd1affac83 - Init COMPLETE
some warning messages
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f252bc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f252bc00000 recvbuffSize 2097152
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f24dbc00000 recvbuffSize 2097152
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f252bc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:586 NCCL WARN Cuda failure 1 'invalid argument'
deepseek-r1-671b-6dc6684dd9-6m8kj-head-vgzzr:734:734 [0] transport/nvls.cc:709 NCCL WARN rank 0 failed to NVLS register sendbuff 0x7f282cc00000 sendbuffSize 2097152 recvbuff 0x7f282cc00000 recvbuffSize 2097152
- For applications metrics, only head pod which has the apiserver deployed emit the metrics. it has to be consistent with the number of the units.
Thanks @Jeffwan. This is a great feature. One quick question, is the head pod concept referring to the Ray head node (the underlying implementation) or a broader context?
@xieus it's specific to ray head.
Autoscaling
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deepseek-r1-671b-56f9654bbb-mgdwd-head-lf5xg 1/1 Running 0 27m 192.168.0.74 192.168.0.51 <none> <none>
deepseek-r1-671b-56f9654bbb-mgdwd-worker-group-worker-pb4hh 1/1 Running 0 27m 192.168.0.81 192.168.0.52 <none> <none>
need minor changes to filter out the worker nodes
E0303 01:10:33.242360 1 kpa.go:256] Failed to get stable and panic metrics for default/deepseek-r1-671b: no data available
E0303 01:10:33.249115 1 controller.go:329] "msg"="Reconciler error" "error"="failed to compute desired number of replicas based on listed metrics for RayClusterFleet/default/deepseek-r1-671b: can not calculate metrics for scale deepseek-r1-671b" "PodAutoscaler"={"name":"deepseek-r1-671b-autoscaling","namespace":"default"} "controller"="podautoscaler" "controllerGroup"="autoscaling.aibrix.ai" "controllerKind"="PodAutoscaler" "name"="deepseek-r1-671b-autoscaling" "namespace"="default" "reconcileID"="432ed9d8-f944-47f8-9975-047731c77ebf"
E0303 01:13:33.242425 1 controller.go:329] "msg"="Reconciler error" "error"="failed to update metrics for scale target reference: failed to fetch metrics from source http://192.168.0.84:8000/metrics: Get \"http://192.168.0.84:8000/metrics\": dial tcp 192.168.0.84:8000: connect: connection refused" "PodAutoscaler"={"name":"deepseek-r1-671b-autoscaling","namespace":"default"} "controller"="podautoscaler" "controllerGroup"="autoscaling.aibrix.ai" "controllerKind"="PodAutoscaler" "name"="deepseek-r1-671b-autoscaling" "namespace"="default" "reconcileID"="d308d1c6-432f-491b-b192-33619c952e3a"
Engineer support for R1 issue has been done. We can close this issue.