aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

In PD separation scenarios, model access requests are directed to non-master nodes.

Open wangchuanfang opened this issue 3 months ago • 17 comments

🐛 Describe the bug

In the P2D2 scenario, the model sends access requests to the P1 pod.

Steps to Reproduce

Step 1、Create a P2D2 deepseek-r1 model. kubectl apply the following YAML file


apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-read
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pod-read-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: pod-read
subjects:
- kind: ServiceAccount
  name: default
  namespace: default   
---
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
  name: pool-xpyd
spec:
  replicas: 1
  updateStrategy:
    type: InPlaceUpdate
  stateful: true
  selector:
    matchLabels:
      app: pool-xpyd
  template:
    metadata:
      labels:
        app: pool-xpyd
    spec:
      roles:
        - name: routing
          replicas: 1
          stateful: true
          template:
            metadata:
              labels:
                app: pool-xpyd
                role: routing
                app.kubernetes.io/name: deepseek-r1-slo
                model.aibrix.ai/name: deepseek-r1
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang                
            spec:
              containers:
                - name: mini-lb
                  # image: docker.1ms.run/aibrix/sglang-router:v0.1.6
                  image: docker.1ms.run/aibrix/sglang-router:v0.1.7-patch.1-20250731
                  # image: docker.1ms.run/aibrix/sglang-router:v0.1.9
                  # image: 172.16.106.102/sglang:v0.1.9-sgl-router-v0.3.3
                  command: [ "sh", "-c" ]
                  args:
                    - |
                      python3 -m sglang_router.launch_router \
                        --pd-disaggregation \
                        --policy round_robin \
                        --host 0.0.0.0 \
                        --service-discovery \
                        --service-discovery-port 30000 \
                        --prefill-selector storm-service-name=$STORM_SERVICE_NAME role-name=prefill stormservice.orchestration.aibrix.ai/role-replica-index=0 \
                        --decode-selector storm-service-name=$STORM_SERVICE_NAME role-name=decode stormservice.orchestration.aibrix.ai/role-replica-index=0 \
                        --service-discovery-namespace default
        - name: prefill
          replicas: 2
          stateful: true
          template:
            metadata:
              annotations:
                k8s.volcengine.com/pod-networks: |
                  [
                    {
                      "cniConf":{
                          "name":"rdma"
                      }
                    }
                  ]
              labels:
                app.kubernetes.io/name: deepseek-r1-slo
                model.aibrix.ai/name: deepseek-r1
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
                # model.aibrix.ai/deployment: deepseek-r1-slo
            spec:
              # nodeSelector:
              #   type: H800
              containers:
                - name: prefill
                  # image: 172.16.106.153/sglang:v0.4.9.post2-8-g10c00166-deepep.9eb2f84
                  image: 172.16.106.102/sglang:v0.5.1.post3-cu126
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /data/deepseek-ai/DeepSeek-R1 \
                        --served-model-name deepseek-r1 \
                        --disaggregation-ib-device mlx5_4 \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode prefill \
                        --disaggregation-transfer-backend=mooncake \
                        --trust-remote-code \
                        --dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
                        --nnodes 2 \
                        --node-rank $ROLE_REPLICA_INDEX \
                        --tp-size 16 \
                        --page-size 1 \
                        --watchdog-timeout 1000000 \
                        --dist-timeout 250 \
                        --mem-fraction-static 0.84 \
                        --max-running-requests 512 \
                        --max-prefill-tokens 32768 \
                        --log-level debug
                  env:
                    - name: GLOO_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_IB_HCA
                      value: mlx5_0,mlx5_2,mlx5_3,mlx5_5
                    - name: NCCL_IB_DISABLE
                      value: "0"
                    - name: NCCL_IB_GID_INDEX
                      value: "7"
                    - name: NCCL_DEBUG
                      value: "INFO"
                    - name: MC_LOG_LEVEL
                      value: INFO
                  volumeMounts:
                    - name: model-vol
                      mountPath: /data/deepseek-ai
                    - mountPath: /dev/shm
                      name: shared-mem
                  resources:
                    requests:
                      nvidia.com/gpu: "8"
                      rdma/rdma_shared_devices: "6"
                    limits:
                      nvidia.com/gpu: "8"
                      rdma/rdma_shared_devices: "6"
                  securityContext:
                    capabilities:
                      add:
                        - IPC_LOCK
              volumes:
                - name: model-vol
                  hostPath:
                    path: /data/deepseek-ai/
                    type: Directory
                - emptyDir:
                    medium: Memory
                  name: shared-mem
        - name: decode
          replicas: 2
          stateful: true
          template:
            metadata:
              annotations:
                k8s.volcengine.com/pod-networks: |
                  [
                    {
                      "cniConf":{
                          "name":"rdma"
                      }
                    }
                  ]
              labels:
                app.kubernetes.io/name: deepseek-r1-slo
                model.aibrix.ai/name: deepseek-r1
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
                # model.aibrix.ai/deployment: deepseek-r1-slo
            spec:
              # nodeSelector:
              #   type: H20
              containers:
                - name: decode
                  # image: 172.16.106.153/sglang:v0.4.9.post2-8-g10c00166-deepep.9eb2f84
                  image: 172.16.106.102/sglang:v0.5.1.post3-cu126
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /data/deepseek-ai/DeepSeek-R1 \
                        --served-model-name deepseek-r1 \
                        --disaggregation-ib-device mlx5_4 \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode decode \
                        --disaggregation-transfer-backend=mooncake \
                        --trust-remote-code \
                        --dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
                        --nnodes 2 \
                        --node-rank $ROLE_REPLICA_INDEX \
                        --tp-size 16 \
                        --page-size 1 \
                        --watchdog-timeout 1000000 \
                        --dist-timeout 600 \
                        --mem-fraction-static 0.84 \
                        --max-running-requests 2048 \
                        --context-length 4096 \
                        --log-level debug
                  env:
                    - name: GLOO_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_IB_HCA
                      value: mlx5_0,mlx5_2,mlx5_3,mlx5_5
                    - name: NCCL_IB_DISABLE
                      value: "0"
                    - name: NCCL_IB_GID_INDEX
                      value: "7"
                    - name: NCCL_DEBUG
                      value: "INFO"
                    - name: MC_LOG_LEVEL
                      value: INFO
                  volumeMounts:
                    - name: model-vol
                      mountPath: /data/deepseek-ai
                    - mountPath: /dev/shm
                      name: shared-mem
                  resources:
                    requests:
                      nvidia.com/gpu: "8"
                      rdma/rdma_shared_devices: "6"
                    limits:
                      nvidia.com/gpu: "8"
                      rdma/rdma_shared_devices: "6"
                  securityContext:
                    capabilities:
                      add:
                        - IPC_LOCK
              volumes:
                - name: model-vol
                  hostPath:
                    path: /data/deepseek-ai/
                    type: Directory
                - emptyDir:
                    medium: Memory
                  name: shared-mem

Step 2、Send the following model access request:

curl -v http://10.6.2.201:80//v1/chat/completions \
-H "Content-Type: application/json" \
-H "routing-strategy: pd" \
-d '{
    "model": "deepseek-r1",
        "messages": [{"role": "user", "content": "hello"}]
}'

Step 3、View gateway plugin log information root@boole-mgr-01:~# kubectl -n aibrix-system logs deployments/aibrix-gateway-plugins -f --tail 100

I0928 02:56:40.110778       1 cache_profile.go:62] === ModelGPUProfile DEBUG: Successfully unmarshalled profile for key: aibrix:profile_deepseek-r1_deepseek-r1-slo, deployment: deepseek-r1-slo ===
I0928 02:56:50.110903       1 cache_profile.go:62] === ModelGPUProfile DEBUG: Successfully unmarshalled profile for key: aibrix:profile_deepseek-r1_deepseek-r1-slo, deployment: deepseek-r1-slo ===
I0928 02:56:57.552506       1 gateway.go:94] "processing request" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177"
I0928 02:56:57.554717       1 gateway.go:184] === SLO DEBUG: About to call router.Route === requestID: 0bb090bc-aad7-404f-8fe0-e15d44263177, routerType: routingalgorithms.pdRouter, podCount: 5
I0928 02:56:57.554948       1 pd_disaggregation.go:201] "start_prefill_request" request_id="0bb090bc-aad7-404f-8fe0-e15d44263177" llm_engine="sglang" prefill_url="http://10.233.85.144:30000/v1/chat/completions"
I0928 02:56:57.554998       1 pd_disaggregation.go:105] "P/D" prefill_pod="pool-xpyd-roleset-rfvcj-prefill-8564869d8f-0" decode_pod="pool-xpyd-roleset-rfvcj-decode-76f5769d5-1"
I0928 02:56:57.555029       1 gateway.go:187] === SLO DEBUG SELECT TARGET POD: Router result === requestID: 0bb090bc-aad7-404f-8fe0-e15d44263177, result: 10.233.84.23:30000, error: <nil>
I0928 02:56:57.555070       1 gateway_req_body.go:78] "request 11111 start Debug SLO" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177" requestPath="/v1/chat/completions" model="deepseek-r1" stream=false routingAlgorithm="pd" targetPodIP="10.233.84.23:30000" routingDuration="2.316523ms"
I0928 02:56:57.555129       1 gateway_req_body.go:92] "request 22222 start" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177" requestPath="/v1/chat/completions" model="deepseek-r1" stream=false routingAlgorithm="pd" targetPodIP="10.233.84.23:30000" routingDuration="2.316523ms"
E0928 02:56:57.555255       1 cache_trace.go:75] error on track request load consumption: output predictor not set
E0928 02:56:57.571327       1 gateway.go:263] "request end" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177" errorCode=404 errorMessage="{\"detail\":\"Not Found\"}"

I0928 02:57:00.111235       1 cache_profile.go:62] === ModelGPUProfile DEBUG: Successfully unmarshalled profile for key: aibrix:profile_deepseek-r1_deepseek-r1-slo, deployment: deepseek-r1-slo ===

targetPodIP=“10.233.84.23:30000” is the pod address information for D1. The expected request is directed to the POD in D0, 10.233.70.7:30000.

Step 4、View model node Kubernetes information

root@boole-mgr-01:~# kubectl get pod -owide
NAME                                           READY   STATUS    RESTARTS   AGE     IP               NODE           NOMINATED NODE   READINESS GATES
pool-xpyd-roleset-rfvcj-decode-76f5769d5-0     1/1     Running   0          2d23h   10.233.70.72     boole-hpc-03   <none>           <none>
pool-xpyd-roleset-rfvcj-decode-76f5769d5-1     1/1     Running   0          2d23h   10.233.84.23     boole-hpc-01   <none>           <none>
pool-xpyd-roleset-rfvcj-prefill-8564869d8f-0   1/1     Running   0          2d23h   10.233.85.144    boole-hpc-04   <none>           <none>
pool-xpyd-roleset-rfvcj-prefill-8564869d8f-1   1/1     Running   0          2d23h   10.233.117.124   boole-hpc-02   <none>           <none>
pool-xpyd-roleset-rfvcj-routing-56bdbc9ff4-0   1/1     Running   0          2d22h   10.233.104.176   boole-mgr-01   <none>           <none>

Expected behavior

In the P2D2 scenario, the model sends access requests to the P0 pod.

Environment

NA

wangchuanfang avatar Sep 28 '25 03:09 wangchuanfang

root@boole-mgr-01:~/aibrix-0.4.1/aibrix# curl -v http://10.6.2.201:80//v1/chat/completions \
-H "Content-Type: application/json" \
-H "routing-strategy: pd" \
-d '{
    "model": "deepseek-r1",
        "messages": [{"role": "user", "content": "hello"}]
}'
*   Trying 10.6.2.201:80...
* Connected to 10.6.2.201 (10.6.2.201) port 80 (#0)
> POST //v1/chat/completions HTTP/1.1
> Host: 10.6.2.201
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> routing-strategy: pd
> Content-Length: 90
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< x-went-into-req-headers: true
< request-id: 0bb090bc-aad7-404f-8fe0-e15d44263177
< target-pod: 10.233.84.23:30000
< content-type: 
< content-length: 61
< date: Sun, 28 Sep 2025 02:57:08 GMT
< 
* Connection #0 to host 10.6.2.201 left intact
{"error":{"code":404,"message":"{\"detail\":\"Not Found\"}"}}root@boole-mgr-01:~/aibrix-0.4.1/aibrix# 

wangchuanfang avatar Sep 28 '25 03:09 wangchuanfang

@varungup90 I meet the same problem, prefill-0 pod is normal, prefill-1 is 404.

ying2025 avatar Sep 28 '25 06:09 ying2025

Can you describe the DO and D1, and share the label key/values. I want to check the value of pod-group-index label.

varungup90 avatar Sep 28 '25 07:09 varungup90

Can you describe the DO and D1, and share the label key/values. I want to check the value of pod-group-index label. @varungup90 D0:

Image D1: Image

P0:

Image P1: Image

ying2025 avatar Sep 28 '25 07:09 ying2025

It is missing pod-group-index label, can you use latest release or main branch for controller.

varungup90 avatar Sep 28 '25 07:09 varungup90

pod-group-index

ok, I'll try updating it. I just pulled the image a few days ago.

ying2025 avatar Sep 28 '25 07:09 ying2025

@wangchuanfang @ying2025 Above scenario has TP=16, and nnodes=2. This is not a 2P2D scenario, but a 1P1D scenario. For router to identify master prefill and decode nodes, pod group size need to be added to spec.

varungup90 avatar Sep 29 '25 17:09 varungup90

@ying2025 @wangchuanfang I encountered a similar problem。 but the error is

I1014 07:39:58.371930       1 pd_disaggregation.go:201] "start_prefill_request" request_id="bfd5f054-90ea-4a14-aa88-0233a585ae4e" llm_engine="sglang" prefill_url="http://100.100.81.23:30000/v1/chat/completions"
I1014 07:39:58.371949       1 pd_disaggregation.go:105] "P/D" prefill_pod="sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-0" decode_pod="sglang-1p1d-roleset-gp65g-decode-75c4868698-1"
I1014 07:39:58.371972       1 gateway_req_body.go:91] "request start" requestID="bfd5f054-90ea-4a14-aa88-0233a585ae4e" requestPath="/v1/chat/completions" model="DeepSeek-R1" stream=true routingAlgorithm="pd" targetPodIP="100.100.236.148:30000" routingDuration="822.639µs"
E1014 07:39:58.539580       1 gateway.go:245] "request end" requestID="bfd5f054-90ea-4a14-aa88-0233a585ae4e" errorCode=404 errorMessage="{\"detail\":\"Not Found\"}. httproutes.gateway.networking.k8s.io \"DeepSeek-R1-router\" not found"
I1014 07:40:18.486023       1 gateway.go:94] "processing request" requestID="41bfdea1-c9ea-4c63-a35e-a944c4531986"
I1014 07:40:18.486541       1 pd_disaggregation.go:201] "start_prefill_request" request_id="41bfdea1-c9ea-4c63-a35e-a944c4531986" llm_engine="sglang" prefill_url="http://100.100.48.25:30000/v1/chat/completions"
I1014 07:40:18.486563       1 pd_disaggregation.go:105] "P/D" prefill_pod="sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-1" decode_pod="sglang-1p1d-roleset-gp65g-decode-75c4868698-0"
I1014 07:40:18.486585       1 gateway_req_body.go:91] "request start" requestID="41bfdea1-c9ea-4c63-a35e-a944c4531986" requestPath="/v1/chat/completions" model="DeepSeek-R1" stream=true routingAlgorithm="pd" targetPodIP="100.100.253.149:30000" routingDuration="478.293µs"
E1014 07:40:18.487710       1 pd_disaggregation.go:209] "prefill request for sglang failed" err="http prefill request failed with status 404: {\"detail\":\"Not Found\"}" request_id="41bfdea1-c9ea-4c63-a35e-a944c4531986"
E1014 07:40:58.372947       1 pd_disaggregation.go:209] "prefill request for sglang failed" err="failed to execute http prefill request: Post \"http://100.100.81.23:30000/v1/chat/completions\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" request_id="bfd5f054-90ea-4a14-aa88-0233a585ae4e"

my pod is

root@pod1-gpu-001:/llm/src/aibrix# kubectl get pod -o wide
NAME                                             READY   STATUS    RESTARTS      AGE   IP                NODE           NOMINATED NODE   READINESS GATES
sglang-1p1d-roleset-gp65g-decode-75c4868698-0    1/1     Running   1 (10m ago)   22m   100.100.253.149   pod1-gpu-028   <none>           <none>
sglang-1p1d-roleset-gp65g-decode-75c4868698-1    1/1     Running   1 (10m ago)   22m   100.100.236.148   pod1-gpu-027   <none>           <none>
sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-0   1/1     Running   0             22m   100.100.81.23     pod1-gpu-030   <none>           <none>
sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-1   1/1     Running   0             22m   100.100.48.25     pod1-gpu-031   <none>           <none>
sglang-1p1d-roleset-gp65g-routing-7df78f55fc-0   1/1     Running   0             22m   100.100.240.21    pod1-gpu-029   <none>           <none>

the yaml is

apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
  name: sglang-1p1d
spec:
  replicas: 1
  updateStrategy:
    type: InPlaceUpdate
  stateful: true
  selector:
    matchLabels:
      app: sglang-1p1d
  template:
    metadata:
      labels:
        app: sglang-1p1d
    spec:
      roles:
        - name: routing
          replicas: 1
          stateful: true
          template:
            spec:
              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                      - matchExpressions:
                          - key: kubernetes.io/hostname
                            operator: In
                            values:
                              - pod1-gpu-027
                              - pod1-gpu-028
                              - pod1-gpu-029
                              - pod1-gpu-030
                              - pod1-gpu-031
                              - pod1-gpu-032
              containers:
                - name: mini-lb
                  image: 10.24.10.61:20405/sglang-router:v0.1.9-curl
                  command: [ "sh", "-c" ]
                  args:
                    - |
                      python3 -m sglang_router.launch_router \
                        --pd-disaggregation \
                        --policy random \
                        --service-discovery \
                        --service-discovery-port 30000 \
                        --prefill-selector storm-service-name=$STORM_SERVICE_NAME role-name=prefill stormservice.orchestration.aibrix.ai/role-replica-index=0 \
                        --decode-selector storm-service-name=$STORM_SERVICE_NAME role-name=decode stormservice.orchestration.aibrix.ai/role-replica-index=0 \
                        --service-discovery-namespace default
        - name: prefill
          replicas: 2
          podGroupSize: 2
          stateful: true
          template:
            metadata:
              labels:
                model.aibrix.ai/name: DeepSeek-R1
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
            spec:
              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                      - matchExpressions:
                          - key: kubernetes.io/hostname
                            operator: In
                            values:
                              - pod1-gpu-027
                              - pod1-gpu-028
                              - pod1-gpu-029
                              - pod1-gpu-030
                              - pod1-gpu-031
                              - pod1-gpu-032
              containers:
                - name: prefill
                  image: 10.24.10.61:20405/sglang:v0.5.3-cu129-my
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /llm/deepseek/DeepSeek-R1-0528-full \
                        --served-model-name DeepSeek-R1 \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode prefill \
                        --disaggregation-transfer-backend=mooncake \
                        --trust-remote-code \
                        --dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
                        --nnodes 2 \
                        --node-rank $ROLE_REPLICA_INDEX \
                        --tp-size 16 \
                        --mem-fraction-static 0.8 \
                        --log-level debug
                  env:
                    - name: GLOO_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_IB_DISABLE
                      value: "0"
                    - name: NCCL_IB_GID_INDEX
                      value: "0"
                    - name: NCCL_DEBUG
                      value: "WARN"
                  volumeMounts:
                    - name: model-vol
                      mountPath: /llm
                    - mountPath: /dev/shm
                      name: shared-mem
                  resources:
                    limits:
                      nvidia.com/gpu: 8
                  securityContext:
                    allowPrivilegeEscalation: true
                    readOnlyRootFilesystem: false
                    runAsNonRoot: false
                    privileged: true
                    capabilities:
                      add:
                        - IPC_LOCK
              volumes:
                - name: model-vol
                  hostPath:
                    path: /llm
                    type: Directory
                - emptyDir:
                    medium: Memory
                  name: shared-mem
        - name: decode
          replicas: 2
          podGroupSize: 2
          stateful: true
          template:
            metadata:
              labels:
                model.aibrix.ai/name: DeepSeek-R1
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
            spec:
              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                      - matchExpressions:
                          - key: kubernetes.io/hostname
                            operator: In
                            values:
                              - pod1-gpu-027
                              - pod1-gpu-028
                              - pod1-gpu-029
                              - pod1-gpu-030
                              - pod1-gpu-031
                              - pod1-gpu-032
              containers:
                - name: decode
                  image: 10.24.10.61:20405/sglang:v0.5.3-cu129-my
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /llm/deepseek/DeepSeek-R1-0528-full \
                        --served-model-name DeepSeek-R1 \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode decode \
                        --disaggregation-transfer-backend=mooncake \
                        --trust-remote-code \
                        --dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
                        --nnodes 2 \
                        --node-rank $ROLE_REPLICA_INDEX \
                        --tp-size 16 \
                        --mem-fraction-static 0.8 \
                        --log-level debug
                  env:
                    - name: GLOO_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_IB_DISABLE
                      value: "0"
                    - name: NCCL_IB_GID_INDEX
                      value: "0"
                    - name: NCCL_DEBUG
                      value: "WARN"
                  volumeMounts:
                    - name: model-vol
                      mountPath: /llm
                    - mountPath: /dev/shm
                      name: shared-mem
                  resources:
                    limits:
                      nvidia.com/gpu: 8
                  securityContext:
                    allowPrivilegeEscalation: true
                    readOnlyRootFilesystem: false
                    runAsNonRoot: false
                    privileged: true
                    capabilities:
                      add:
                        - IPC_LOCK
              volumes:
                - name: model-vol
                  hostPath:
                    path: /llm
                    type: Directory
                - emptyDir:
                    medium: Memory
                  name: shared-mem

how to fix ?

XiaobinZhao avatar Oct 14 '25 08:10 XiaobinZhao

@varungup90 @Jeffwan can you help?

XiaobinZhao avatar Oct 15 '25 03:10 XiaobinZhao

@XiaobinZhao I will take a look tomorrow, should be some configuration issue

Jeffwan avatar Oct 15 '25 05:10 Jeffwan

@Jeffwan any news?

XiaobinZhao avatar Oct 17 '25 06:10 XiaobinZhao

@XiaobinZhao kind of busy today and didn't get a chance to make it. I am pretty free tomorrow and will reproduce it. thanks for the patience

Jeffwan avatar Oct 17 '25 07:10 Jeffwan

@XiaobinZhao can you try to use --dist-init-addr "${PODSET_NAME}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \ for service discovery. the latest router has filter pod with stormservice.orchestration.aibrix.ai/pod-group-index=0 pods. Could I know the router and controller manager version you deployed?

apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
  name: tp-1p1d
spec:
  replicas: 1
  updateStrategy:
    type: InPlaceUpdate
  stateful: true
  selector:
    matchLabels:
      app: tp-1p1d
  template:
    metadata:
      labels:
        app: tp-1p1d
    spec:
      roles:
        - name: prefill
          replicas: 1
          podGroupSize: 2
          stateful: true
          template:
            metadata:
              annotations:
                k8s.volcengine.com/pod-networks: |
                  [
                    {
                      "cniConf":{
                          "name":"rdma"
                      }
                    }
                  ]
              labels:
                model.aibrix.ai/name: qwen3-8B
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
            spec:
              nodeSelector:
                kubernetes.io/hostname: 192.168.0.6
              containers:
                - name: prefill
                  image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /models/Qwen3-8B \
                        --served-model-name qwen3-8B \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode prefill \
                        --disaggregation-transfer-backend=nixl \
                        --trust-remote-code \
                        --dist-init-addr "${PODSET_NAME}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
                        --nnodes 2 \
                        --node-rank $POD_GROUP_INDEX \
                        --tp-size 2 \
                        --mem-fraction-static 0.8 \
                        --log-level debug
                  env:
                    - name: GLOO_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_IB_DISABLE
                      value: "0"
                    - name: NCCL_IB_GID_INDEX
                      value: "7"
                    - name: NCCL_DEBUG
                      value: "INFO"
                    - name: UCX_TLS
                      value: ^gga
                  volumeMounts:
                    - name: model-vol
                      mountPath: /models
                    - mountPath: /dev/shm
                      name: shared-mem
                  resources:
                    limits:
                      nvidia.com/gpu: 1
                      vke.volcengine.com/rdma: "1"
                  securityContext:
                    capabilities:
                      add:
                        - IPC_LOCK
              volumes:
                - name: model-vol
                  hostPath:
                    path: /root/models
                    type: Directory
                - emptyDir:
                    medium: Memory
                  name: shared-mem
        - name: decode
          replicas: 1
          podGroupSize: 2
          stateful: true
          template:
            metadata:
              annotations:
                k8s.volcengine.com/pod-networks: |
                  [
                    {
                      "cniConf":{
                          "name":"rdma"
                      }
                    }
                  ]
              labels:
                model.aibrix.ai/name: qwen3-8B
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
            spec:
              nodeSelector:
                kubernetes.io/hostname: 192.168.0.6
              containers:
                - name: decode
                  image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /models/Qwen3-8B \
                        --served-model-name qwen3-8B \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode decode \
                        --disaggregation-transfer-backend=nixl \
                        --trust-remote-code \
                        --dist-init-addr "${PODSET_NAME}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
                        --nnodes 2 \
                        --node-rank $POD_GROUP_INDEX\
                        --tp-size 2 \
                        --mem-fraction-static 0.8 \
                        --log-level debug
                  env:
                    - name: GLOO_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_SOCKET_IFNAME
                      value: eth0
                    - name: NCCL_IB_DISABLE
                      value: "0"
                    - name: NCCL_IB_GID_INDEX
                      value: "7"
                    - name: NCCL_DEBUG
                      value: "INFO"
                    - name: UCX_TLS
                      value: ^gga
                  volumeMounts:
                    - name: model-vol
                      mountPath: /models
                    - mountPath: /dev/shm
                      name: shared-mem
                  resources:
                    limits:
                      nvidia.com/gpu: 1
                      vke.volcengine.com/rdma: "1"
                  securityContext:
                    capabilities:
                      add:
                        - IPC_LOCK
              volumes:
                - name: model-vol
                  hostPath:
                    path: /root/models
                    type: Directory
                - emptyDir:
                    medium: Memory
                  name: shared-mem

Jeffwan avatar Oct 20 '25 00:10 Jeffwan

@Jeffwan After adjusting according to the YAML you provided, it was found that the startup failed because the PODSET_NAME environment variable does not exist. add env:

env:
  - name: GLOO_SOCKET_IFNAME
    value: eth0
  - name: NCCL_SOCKET_IFNAME
    value: eth0
  - name: NCCL_IB_DISABLE
    value: "0"
  - name: NCCL_IB_GID_INDEX
    value: "0"
  - name: NCCL_DEBUG
    value: "WARN"
  - name: PODSET_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name

but torch.distributed.DistNetworkError: Failed to recv, got 0 bytes; it seems that prefiil-0 cannot connected with prefill-1. I notice that your yaml set replicas:1; so i tried only 1node for p and 1node for d, startup is success, but curl v1/chat/completions failed as before.

the aibrix models version is 0.4.1:

kvcache:20241120
controller-manager:v0.4.1
gateway-plugins:v0.4.1
runtime:v0.4.1
kuberay-operator:v1.2.1-patch-20250726
metadata-service:v0.4.1

and the sglang-router is v0.1.9

so ,how to fix this? and i want set the prefill(tp=16) and decode(tp=16).

XiaobinZhao avatar Oct 20 '25 03:10 XiaobinZhao

@XiaobinZhao you need to use nightly aibrix image. replace v0.4.1 with nightly image. v0.4.1 doesn't have all the features

BTW, I update the above yaml. (use $POD_GROUP_INDEX to replace $ROLE_REPLICA_INDEX). please copy the updated one. I also send you an email to join our wechat group. please check your github email.

Jeffwan avatar Oct 20 '25 06:10 Jeffwan

@XiaobinZhao Is the issue fixed now?

varungup90 avatar Nov 17 '25 23:11 varungup90

yes, replace v0.4.1 with nightly image resoled my problem.

XiaobinZhao avatar Nov 18 '25 01:11 XiaobinZhao