In PD separation scenarios, model access requests are directed to non-master nodes.
🐛 Describe the bug
In the P2D2 scenario, the model sends access requests to the P1 pod.
Steps to Reproduce
Step 1、Create a P2D2 deepseek-r1 model. kubectl apply the following YAML file
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pod-read
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- watch
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pod-read-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: pod-read
subjects:
- kind: ServiceAccount
name: default
namespace: default
---
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
name: pool-xpyd
spec:
replicas: 1
updateStrategy:
type: InPlaceUpdate
stateful: true
selector:
matchLabels:
app: pool-xpyd
template:
metadata:
labels:
app: pool-xpyd
spec:
roles:
- name: routing
replicas: 1
stateful: true
template:
metadata:
labels:
app: pool-xpyd
role: routing
app.kubernetes.io/name: deepseek-r1-slo
model.aibrix.ai/name: deepseek-r1
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
spec:
containers:
- name: mini-lb
# image: docker.1ms.run/aibrix/sglang-router:v0.1.6
image: docker.1ms.run/aibrix/sglang-router:v0.1.7-patch.1-20250731
# image: docker.1ms.run/aibrix/sglang-router:v0.1.9
# image: 172.16.106.102/sglang:v0.1.9-sgl-router-v0.3.3
command: [ "sh", "-c" ]
args:
- |
python3 -m sglang_router.launch_router \
--pd-disaggregation \
--policy round_robin \
--host 0.0.0.0 \
--service-discovery \
--service-discovery-port 30000 \
--prefill-selector storm-service-name=$STORM_SERVICE_NAME role-name=prefill stormservice.orchestration.aibrix.ai/role-replica-index=0 \
--decode-selector storm-service-name=$STORM_SERVICE_NAME role-name=decode stormservice.orchestration.aibrix.ai/role-replica-index=0 \
--service-discovery-namespace default
- name: prefill
replicas: 2
stateful: true
template:
metadata:
annotations:
k8s.volcengine.com/pod-networks: |
[
{
"cniConf":{
"name":"rdma"
}
}
]
labels:
app.kubernetes.io/name: deepseek-r1-slo
model.aibrix.ai/name: deepseek-r1
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
# model.aibrix.ai/deployment: deepseek-r1-slo
spec:
# nodeSelector:
# type: H800
containers:
- name: prefill
# image: 172.16.106.153/sglang:v0.4.9.post2-8-g10c00166-deepep.9eb2f84
image: 172.16.106.102/sglang:v0.5.1.post3-cu126
command: ["sh", "-c"]
args:
- |
python3 -m sglang.launch_server \
--model-path /data/deepseek-ai/DeepSeek-R1 \
--served-model-name deepseek-r1 \
--disaggregation-ib-device mlx5_4 \
--host 0.0.0.0 \
--port 30000 \
--disaggregation-mode prefill \
--disaggregation-transfer-backend=mooncake \
--trust-remote-code \
--dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
--nnodes 2 \
--node-rank $ROLE_REPLICA_INDEX \
--tp-size 16 \
--page-size 1 \
--watchdog-timeout 1000000 \
--dist-timeout 250 \
--mem-fraction-static 0.84 \
--max-running-requests 512 \
--max-prefill-tokens 32768 \
--log-level debug
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_HCA
value: mlx5_0,mlx5_2,mlx5_3,mlx5_5
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "7"
- name: NCCL_DEBUG
value: "INFO"
- name: MC_LOG_LEVEL
value: INFO
volumeMounts:
- name: model-vol
mountPath: /data/deepseek-ai
- mountPath: /dev/shm
name: shared-mem
resources:
requests:
nvidia.com/gpu: "8"
rdma/rdma_shared_devices: "6"
limits:
nvidia.com/gpu: "8"
rdma/rdma_shared_devices: "6"
securityContext:
capabilities:
add:
- IPC_LOCK
volumes:
- name: model-vol
hostPath:
path: /data/deepseek-ai/
type: Directory
- emptyDir:
medium: Memory
name: shared-mem
- name: decode
replicas: 2
stateful: true
template:
metadata:
annotations:
k8s.volcengine.com/pod-networks: |
[
{
"cniConf":{
"name":"rdma"
}
}
]
labels:
app.kubernetes.io/name: deepseek-r1-slo
model.aibrix.ai/name: deepseek-r1
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
# model.aibrix.ai/deployment: deepseek-r1-slo
spec:
# nodeSelector:
# type: H20
containers:
- name: decode
# image: 172.16.106.153/sglang:v0.4.9.post2-8-g10c00166-deepep.9eb2f84
image: 172.16.106.102/sglang:v0.5.1.post3-cu126
command: ["sh", "-c"]
args:
- |
python3 -m sglang.launch_server \
--model-path /data/deepseek-ai/DeepSeek-R1 \
--served-model-name deepseek-r1 \
--disaggregation-ib-device mlx5_4 \
--host 0.0.0.0 \
--port 30000 \
--disaggregation-mode decode \
--disaggregation-transfer-backend=mooncake \
--trust-remote-code \
--dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
--nnodes 2 \
--node-rank $ROLE_REPLICA_INDEX \
--tp-size 16 \
--page-size 1 \
--watchdog-timeout 1000000 \
--dist-timeout 600 \
--mem-fraction-static 0.84 \
--max-running-requests 2048 \
--context-length 4096 \
--log-level debug
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_HCA
value: mlx5_0,mlx5_2,mlx5_3,mlx5_5
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "7"
- name: NCCL_DEBUG
value: "INFO"
- name: MC_LOG_LEVEL
value: INFO
volumeMounts:
- name: model-vol
mountPath: /data/deepseek-ai
- mountPath: /dev/shm
name: shared-mem
resources:
requests:
nvidia.com/gpu: "8"
rdma/rdma_shared_devices: "6"
limits:
nvidia.com/gpu: "8"
rdma/rdma_shared_devices: "6"
securityContext:
capabilities:
add:
- IPC_LOCK
volumes:
- name: model-vol
hostPath:
path: /data/deepseek-ai/
type: Directory
- emptyDir:
medium: Memory
name: shared-mem
Step 2、Send the following model access request:
curl -v http://10.6.2.201:80//v1/chat/completions \
-H "Content-Type: application/json" \
-H "routing-strategy: pd" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "hello"}]
}'
Step 3、View gateway plugin log information root@boole-mgr-01:~# kubectl -n aibrix-system logs deployments/aibrix-gateway-plugins -f --tail 100
I0928 02:56:40.110778 1 cache_profile.go:62] === ModelGPUProfile DEBUG: Successfully unmarshalled profile for key: aibrix:profile_deepseek-r1_deepseek-r1-slo, deployment: deepseek-r1-slo ===
I0928 02:56:50.110903 1 cache_profile.go:62] === ModelGPUProfile DEBUG: Successfully unmarshalled profile for key: aibrix:profile_deepseek-r1_deepseek-r1-slo, deployment: deepseek-r1-slo ===
I0928 02:56:57.552506 1 gateway.go:94] "processing request" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177"
I0928 02:56:57.554717 1 gateway.go:184] === SLO DEBUG: About to call router.Route === requestID: 0bb090bc-aad7-404f-8fe0-e15d44263177, routerType: routingalgorithms.pdRouter, podCount: 5
I0928 02:56:57.554948 1 pd_disaggregation.go:201] "start_prefill_request" request_id="0bb090bc-aad7-404f-8fe0-e15d44263177" llm_engine="sglang" prefill_url="http://10.233.85.144:30000/v1/chat/completions"
I0928 02:56:57.554998 1 pd_disaggregation.go:105] "P/D" prefill_pod="pool-xpyd-roleset-rfvcj-prefill-8564869d8f-0" decode_pod="pool-xpyd-roleset-rfvcj-decode-76f5769d5-1"
I0928 02:56:57.555029 1 gateway.go:187] === SLO DEBUG SELECT TARGET POD: Router result === requestID: 0bb090bc-aad7-404f-8fe0-e15d44263177, result: 10.233.84.23:30000, error: <nil>
I0928 02:56:57.555070 1 gateway_req_body.go:78] "request 11111 start Debug SLO" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177" requestPath="/v1/chat/completions" model="deepseek-r1" stream=false routingAlgorithm="pd" targetPodIP="10.233.84.23:30000" routingDuration="2.316523ms"
I0928 02:56:57.555129 1 gateway_req_body.go:92] "request 22222 start" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177" requestPath="/v1/chat/completions" model="deepseek-r1" stream=false routingAlgorithm="pd" targetPodIP="10.233.84.23:30000" routingDuration="2.316523ms"
E0928 02:56:57.555255 1 cache_trace.go:75] error on track request load consumption: output predictor not set
E0928 02:56:57.571327 1 gateway.go:263] "request end" requestID="0bb090bc-aad7-404f-8fe0-e15d44263177" errorCode=404 errorMessage="{\"detail\":\"Not Found\"}"
I0928 02:57:00.111235 1 cache_profile.go:62] === ModelGPUProfile DEBUG: Successfully unmarshalled profile for key: aibrix:profile_deepseek-r1_deepseek-r1-slo, deployment: deepseek-r1-slo ===
targetPodIP=“10.233.84.23:30000” is the pod address information for D1. The expected request is directed to the POD in D0, 10.233.70.7:30000.
Step 4、View model node Kubernetes information
root@boole-mgr-01:~# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pool-xpyd-roleset-rfvcj-decode-76f5769d5-0 1/1 Running 0 2d23h 10.233.70.72 boole-hpc-03 <none> <none>
pool-xpyd-roleset-rfvcj-decode-76f5769d5-1 1/1 Running 0 2d23h 10.233.84.23 boole-hpc-01 <none> <none>
pool-xpyd-roleset-rfvcj-prefill-8564869d8f-0 1/1 Running 0 2d23h 10.233.85.144 boole-hpc-04 <none> <none>
pool-xpyd-roleset-rfvcj-prefill-8564869d8f-1 1/1 Running 0 2d23h 10.233.117.124 boole-hpc-02 <none> <none>
pool-xpyd-roleset-rfvcj-routing-56bdbc9ff4-0 1/1 Running 0 2d22h 10.233.104.176 boole-mgr-01 <none> <none>
Expected behavior
In the P2D2 scenario, the model sends access requests to the P0 pod.
Environment
NA
root@boole-mgr-01:~/aibrix-0.4.1/aibrix# curl -v http://10.6.2.201:80//v1/chat/completions \
-H "Content-Type: application/json" \
-H "routing-strategy: pd" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "hello"}]
}'
* Trying 10.6.2.201:80...
* Connected to 10.6.2.201 (10.6.2.201) port 80 (#0)
> POST //v1/chat/completions HTTP/1.1
> Host: 10.6.2.201
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> routing-strategy: pd
> Content-Length: 90
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< x-went-into-req-headers: true
< request-id: 0bb090bc-aad7-404f-8fe0-e15d44263177
< target-pod: 10.233.84.23:30000
< content-type:
< content-length: 61
< date: Sun, 28 Sep 2025 02:57:08 GMT
<
* Connection #0 to host 10.6.2.201 left intact
{"error":{"code":404,"message":"{\"detail\":\"Not Found\"}"}}root@boole-mgr-01:~/aibrix-0.4.1/aibrix#
@varungup90 I meet the same problem, prefill-0 pod is normal, prefill-1 is 404.
Can you describe the DO and D1, and share the label key/values. I want to check the value of pod-group-index label.
Can you describe the DO and D1, and share the label key/values. I want to check the value of pod-group-index label. @varungup90 D0:
P0:
It is missing pod-group-index label, can you use latest release or main branch for controller.
pod-group-index
ok, I'll try updating it. I just pulled the image a few days ago.
@wangchuanfang @ying2025 Above scenario has TP=16, and nnodes=2. This is not a 2P2D scenario, but a 1P1D scenario. For router to identify master prefill and decode nodes, pod group size need to be added to spec.
@ying2025 @wangchuanfang I encountered a similar problem。 but the error is
I1014 07:39:58.371930 1 pd_disaggregation.go:201] "start_prefill_request" request_id="bfd5f054-90ea-4a14-aa88-0233a585ae4e" llm_engine="sglang" prefill_url="http://100.100.81.23:30000/v1/chat/completions"
I1014 07:39:58.371949 1 pd_disaggregation.go:105] "P/D" prefill_pod="sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-0" decode_pod="sglang-1p1d-roleset-gp65g-decode-75c4868698-1"
I1014 07:39:58.371972 1 gateway_req_body.go:91] "request start" requestID="bfd5f054-90ea-4a14-aa88-0233a585ae4e" requestPath="/v1/chat/completions" model="DeepSeek-R1" stream=true routingAlgorithm="pd" targetPodIP="100.100.236.148:30000" routingDuration="822.639µs"
E1014 07:39:58.539580 1 gateway.go:245] "request end" requestID="bfd5f054-90ea-4a14-aa88-0233a585ae4e" errorCode=404 errorMessage="{\"detail\":\"Not Found\"}. httproutes.gateway.networking.k8s.io \"DeepSeek-R1-router\" not found"
I1014 07:40:18.486023 1 gateway.go:94] "processing request" requestID="41bfdea1-c9ea-4c63-a35e-a944c4531986"
I1014 07:40:18.486541 1 pd_disaggregation.go:201] "start_prefill_request" request_id="41bfdea1-c9ea-4c63-a35e-a944c4531986" llm_engine="sglang" prefill_url="http://100.100.48.25:30000/v1/chat/completions"
I1014 07:40:18.486563 1 pd_disaggregation.go:105] "P/D" prefill_pod="sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-1" decode_pod="sglang-1p1d-roleset-gp65g-decode-75c4868698-0"
I1014 07:40:18.486585 1 gateway_req_body.go:91] "request start" requestID="41bfdea1-c9ea-4c63-a35e-a944c4531986" requestPath="/v1/chat/completions" model="DeepSeek-R1" stream=true routingAlgorithm="pd" targetPodIP="100.100.253.149:30000" routingDuration="478.293µs"
E1014 07:40:18.487710 1 pd_disaggregation.go:209] "prefill request for sglang failed" err="http prefill request failed with status 404: {\"detail\":\"Not Found\"}" request_id="41bfdea1-c9ea-4c63-a35e-a944c4531986"
E1014 07:40:58.372947 1 pd_disaggregation.go:209] "prefill request for sglang failed" err="failed to execute http prefill request: Post \"http://100.100.81.23:30000/v1/chat/completions\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" request_id="bfd5f054-90ea-4a14-aa88-0233a585ae4e"
my pod is
root@pod1-gpu-001:/llm/src/aibrix# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sglang-1p1d-roleset-gp65g-decode-75c4868698-0 1/1 Running 1 (10m ago) 22m 100.100.253.149 pod1-gpu-028 <none> <none>
sglang-1p1d-roleset-gp65g-decode-75c4868698-1 1/1 Running 1 (10m ago) 22m 100.100.236.148 pod1-gpu-027 <none> <none>
sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-0 1/1 Running 0 22m 100.100.81.23 pod1-gpu-030 <none> <none>
sglang-1p1d-roleset-gp65g-prefill-76898d9c9d-1 1/1 Running 0 22m 100.100.48.25 pod1-gpu-031 <none> <none>
sglang-1p1d-roleset-gp65g-routing-7df78f55fc-0 1/1 Running 0 22m 100.100.240.21 pod1-gpu-029 <none> <none>
the yaml is
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
name: sglang-1p1d
spec:
replicas: 1
updateStrategy:
type: InPlaceUpdate
stateful: true
selector:
matchLabels:
app: sglang-1p1d
template:
metadata:
labels:
app: sglang-1p1d
spec:
roles:
- name: routing
replicas: 1
stateful: true
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- pod1-gpu-027
- pod1-gpu-028
- pod1-gpu-029
- pod1-gpu-030
- pod1-gpu-031
- pod1-gpu-032
containers:
- name: mini-lb
image: 10.24.10.61:20405/sglang-router:v0.1.9-curl
command: [ "sh", "-c" ]
args:
- |
python3 -m sglang_router.launch_router \
--pd-disaggregation \
--policy random \
--service-discovery \
--service-discovery-port 30000 \
--prefill-selector storm-service-name=$STORM_SERVICE_NAME role-name=prefill stormservice.orchestration.aibrix.ai/role-replica-index=0 \
--decode-selector storm-service-name=$STORM_SERVICE_NAME role-name=decode stormservice.orchestration.aibrix.ai/role-replica-index=0 \
--service-discovery-namespace default
- name: prefill
replicas: 2
podGroupSize: 2
stateful: true
template:
metadata:
labels:
model.aibrix.ai/name: DeepSeek-R1
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- pod1-gpu-027
- pod1-gpu-028
- pod1-gpu-029
- pod1-gpu-030
- pod1-gpu-031
- pod1-gpu-032
containers:
- name: prefill
image: 10.24.10.61:20405/sglang:v0.5.3-cu129-my
command: ["sh", "-c"]
args:
- |
python3 -m sglang.launch_server \
--model-path /llm/deepseek/DeepSeek-R1-0528-full \
--served-model-name DeepSeek-R1 \
--host 0.0.0.0 \
--port 30000 \
--disaggregation-mode prefill \
--disaggregation-transfer-backend=mooncake \
--trust-remote-code \
--dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
--nnodes 2 \
--node-rank $ROLE_REPLICA_INDEX \
--tp-size 16 \
--mem-fraction-static 0.8 \
--log-level debug
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "0"
- name: NCCL_DEBUG
value: "WARN"
volumeMounts:
- name: model-vol
mountPath: /llm
- mountPath: /dev/shm
name: shared-mem
resources:
limits:
nvidia.com/gpu: 8
securityContext:
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
runAsNonRoot: false
privileged: true
capabilities:
add:
- IPC_LOCK
volumes:
- name: model-vol
hostPath:
path: /llm
type: Directory
- emptyDir:
medium: Memory
name: shared-mem
- name: decode
replicas: 2
podGroupSize: 2
stateful: true
template:
metadata:
labels:
model.aibrix.ai/name: DeepSeek-R1
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- pod1-gpu-027
- pod1-gpu-028
- pod1-gpu-029
- pod1-gpu-030
- pod1-gpu-031
- pod1-gpu-032
containers:
- name: decode
image: 10.24.10.61:20405/sglang:v0.5.3-cu129-my
command: ["sh", "-c"]
args:
- |
python3 -m sglang.launch_server \
--model-path /llm/deepseek/DeepSeek-R1-0528-full \
--served-model-name DeepSeek-R1 \
--host 0.0.0.0 \
--port 30000 \
--disaggregation-mode decode \
--disaggregation-transfer-backend=mooncake \
--trust-remote-code \
--dist-init-addr "${ROLESET_NAME}-${ROLE_NAME}-${ROLE_TEMPLATE_HASH}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
--nnodes 2 \
--node-rank $ROLE_REPLICA_INDEX \
--tp-size 16 \
--mem-fraction-static 0.8 \
--log-level debug
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "0"
- name: NCCL_DEBUG
value: "WARN"
volumeMounts:
- name: model-vol
mountPath: /llm
- mountPath: /dev/shm
name: shared-mem
resources:
limits:
nvidia.com/gpu: 8
securityContext:
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
runAsNonRoot: false
privileged: true
capabilities:
add:
- IPC_LOCK
volumes:
- name: model-vol
hostPath:
path: /llm
type: Directory
- emptyDir:
medium: Memory
name: shared-mem
how to fix ?
@varungup90 @Jeffwan can you help?
@XiaobinZhao I will take a look tomorrow, should be some configuration issue
@Jeffwan any news?
@XiaobinZhao kind of busy today and didn't get a chance to make it. I am pretty free tomorrow and will reproduce it. thanks for the patience
@XiaobinZhao can you try to use --dist-init-addr "${PODSET_NAME}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \ for service discovery. the latest router has filter pod with stormservice.orchestration.aibrix.ai/pod-group-index=0 pods. Could I know the router and controller manager version you deployed?
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
name: tp-1p1d
spec:
replicas: 1
updateStrategy:
type: InPlaceUpdate
stateful: true
selector:
matchLabels:
app: tp-1p1d
template:
metadata:
labels:
app: tp-1p1d
spec:
roles:
- name: prefill
replicas: 1
podGroupSize: 2
stateful: true
template:
metadata:
annotations:
k8s.volcengine.com/pod-networks: |
[
{
"cniConf":{
"name":"rdma"
}
}
]
labels:
model.aibrix.ai/name: qwen3-8B
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
spec:
nodeSelector:
kubernetes.io/hostname: 192.168.0.6
containers:
- name: prefill
image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1
command: ["sh", "-c"]
args:
- |
python3 -m sglang.launch_server \
--model-path /models/Qwen3-8B \
--served-model-name qwen3-8B \
--host 0.0.0.0 \
--port 30000 \
--disaggregation-mode prefill \
--disaggregation-transfer-backend=nixl \
--trust-remote-code \
--dist-init-addr "${PODSET_NAME}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
--nnodes 2 \
--node-rank $POD_GROUP_INDEX \
--tp-size 2 \
--mem-fraction-static 0.8 \
--log-level debug
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "7"
- name: NCCL_DEBUG
value: "INFO"
- name: UCX_TLS
value: ^gga
volumeMounts:
- name: model-vol
mountPath: /models
- mountPath: /dev/shm
name: shared-mem
resources:
limits:
nvidia.com/gpu: 1
vke.volcengine.com/rdma: "1"
securityContext:
capabilities:
add:
- IPC_LOCK
volumes:
- name: model-vol
hostPath:
path: /root/models
type: Directory
- emptyDir:
medium: Memory
name: shared-mem
- name: decode
replicas: 1
podGroupSize: 2
stateful: true
template:
metadata:
annotations:
k8s.volcengine.com/pod-networks: |
[
{
"cniConf":{
"name":"rdma"
}
}
]
labels:
model.aibrix.ai/name: qwen3-8B
model.aibrix.ai/port: "30000"
model.aibrix.ai/engine: sglang
spec:
nodeSelector:
kubernetes.io/hostname: 192.168.0.6
containers:
- name: decode
image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1
command: ["sh", "-c"]
args:
- |
python3 -m sglang.launch_server \
--model-path /models/Qwen3-8B \
--served-model-name qwen3-8B \
--host 0.0.0.0 \
--port 30000 \
--disaggregation-mode decode \
--disaggregation-transfer-backend=nixl \
--trust-remote-code \
--dist-init-addr "${PODSET_NAME}-0.${STORM_SERVICE_NAME}.default.svc.cluster.local:5000" \
--nnodes 2 \
--node-rank $POD_GROUP_INDEX\
--tp-size 2 \
--mem-fraction-static 0.8 \
--log-level debug
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "7"
- name: NCCL_DEBUG
value: "INFO"
- name: UCX_TLS
value: ^gga
volumeMounts:
- name: model-vol
mountPath: /models
- mountPath: /dev/shm
name: shared-mem
resources:
limits:
nvidia.com/gpu: 1
vke.volcengine.com/rdma: "1"
securityContext:
capabilities:
add:
- IPC_LOCK
volumes:
- name: model-vol
hostPath:
path: /root/models
type: Directory
- emptyDir:
medium: Memory
name: shared-mem
@Jeffwan After adjusting according to the YAML you provided, it was found that the startup failed because the PODSET_NAME environment variable does not exist. add env:
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_GID_INDEX
value: "0"
- name: NCCL_DEBUG
value: "WARN"
- name: PODSET_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
but torch.distributed.DistNetworkError: Failed to recv, got 0 bytes; it seems that prefiil-0 cannot connected with prefill-1.
I notice that your yaml set replicas:1; so i tried only 1node for p and 1node for d, startup is success, but curl v1/chat/completions failed as before.
the aibrix models version is 0.4.1:
kvcache:20241120
controller-manager:v0.4.1
gateway-plugins:v0.4.1
runtime:v0.4.1
kuberay-operator:v1.2.1-patch-20250726
metadata-service:v0.4.1
and the sglang-router is v0.1.9
so ,how to fix this? and i want set the prefill(tp=16) and decode(tp=16).
@XiaobinZhao you need to use nightly aibrix image. replace v0.4.1 with nightly image. v0.4.1 doesn't have all the features
BTW, I update the above yaml. (use $POD_GROUP_INDEX to replace $ROLE_REPLICA_INDEX). please copy the updated one. I also send you an email to join our wechat group. please check your github email.
@XiaobinZhao Is the issue fixed now?
yes, replace v0.4.1 with nightly image resoled my problem.