Empty LB_IP when try Quickstart for AMD ROCm Cluster
I am trying to run Quickstart — AIBrix on a Cluster of AMD MI300X ROCm platform.
Steps ( follow the instruction of the Quickstart)
- Installl AIBrix and check the pods,
$ kubectl get pods -n aibrix-system
NAME READY STATUS RESTARTS AGE
aibrix-controller-manager-6489d5b587-k2szh 1/1 Running 0 30h
aibrix-gateway-plugins-58bdc89d9c-l4vv9 1/1 Running 0 30h
aibrix-gpu-optimizer-75df97858d-5hmbk 1/1 Running 0 30h
aibrix-kuberay-operator-55f5ddcbf4-b8d8z 1/1 Running 0 30h
aibrix-metadata-service-66f45c85bc-hsg6x 1/1 Running 0 30h
aibrix-redis-master-7bff9b56f5-hdt7h 1/1 Running 0 30h
- Deploy base model Refer to model.yaml of the Quickstart to create the model.yaml for ROCm. Just two parts modified for ROCm,
- use
rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325which is the vllm/rocm docker image - use amd.com/gpu in the resource of the container
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b # Note: The label value `model.aibrix.ai/name` here must match with the service name.
model.aibrix.ai/port: "8000"
name: deepseek-r1-distill-llama-8b
namespace: default
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
spec:
containers:
- command:
- python3
- -m
- vllm.entrypoints.openai.api_server
- --host
- "0.0.0.0"
- --port
- "8000"
- --uvicorn-log-level
- warning
- --model
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- --served-model-name
# Note: The `--served-model-name` argument value must also match the Service name and the Deployment label `model.aibrix.ai/name`
- deepseek-r1-distill-llama-8b
- --max-model-len
- "12288" # 24k length, this is to avoid "The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache" issue.
#image: vllm/vllm-openai:v0.7.1
image: rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325
imagePullPolicy: IfNotPresent
name: vllm-openai
securityContext:
seccompProfile:
type: Unconfined
runAsGroup: 44
capabilities:
add:
- SYS_PTRACE
ports:
- containerPort: 8000
protocol: TCP
resources:
limits:
amd.com/gpu: "1"
requests:
amd.com/gpu: "1"
livenessProbe:
httpGet:
path: /health
port: 8000
scheme: HTTP
failureThreshold: 3
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /health
port: 8000
scheme: HTTP
failureThreshold: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
startupProbe:
httpGet:
path: /health
port: 8000
scheme: HTTP
failureThreshold: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
prometheus-discovery: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
name: deepseek-r1-distill-llama-8b # Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
namespace: default
spec:
ports:
- name: serve
port: 8000
protocol: TCP
targetPort: 8000
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
type: ClusterIP
Make sure the vLLM service is working fine after run run kubectl apply -f model.yaml
(base) amd@tw043:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
deepseek-r1-distill-llama-8b-78874f6d48-t8gbp 1/1 Running 0 21h
(base) amd@tw043:~$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
deepseek-r1-distill-llama-8b ClusterIP 10.43.222.109 <none> 8000/TCP,8080/TCP 23h
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 51d
(base) amd@tw043:~$ curl http://10.43.222.109:8000/v1/models
{"object":"list","data":[{"id":"deepseek-r1-distill-llama-8b","object":"model","created":1745305150,"owned_by":"vllm","root":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","parent":null,"max_model_len":12288,"permission":[{"id":"modelperm-2559299ffa2c4d29b2b6ce2b6a8ba6ee","object":"model_permission","created":1745305150,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}(base)
- Invoke the model endpoint using gateway api
# Option 1: Kubernetes cluster with LoadBalancer support
LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
ENDPOINT="${LB_IP}:80"
But the LB_IP is empty check by ,
(base) amd@tw043:~$ LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
(base) amd@tw043:~$
(base) amd@tw043:~$ ENDPOINT="${LB_IP}:80"
(base) amd@tw043:~$
(base) amd@tw043:~$ echo $ENDPOINT
:80
Then check the service of envoy-gateway-system. The PORTS are
(base) amd@tw043:~$ kubectl get svc -n envoy-gateway-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.43.160.151 <pending> 80:32025/TCP 29h
envoy-gateway ClusterIP 10.43.73.183 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP 29h
- what's your kubernetes offering? are you on public cloud or on-prem cluster?
- could you describe
kubectl describe svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-systemto check the service pending information? Seems your service-controller can not create the service successfully
- what's your kubernetes offering? are you on public cloud or on-prem cluster?
- could you describe
kubectl describe svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-systemto check the service pending information? Seems your service-controller can not create the service successfully
Hi Jeffwan,
Thank for your response.
- It is a small cluster of private cloud solution operated by our vendor.
- See the details
(base) amd@tw043:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
deepseek-r1-distill-llama-8b-78874f6d48-qq295 1/1 Running 57 (17m ago) 4h31m
(base) amd@tw043:~$ kubectl get pods -n aibrix-system
NAME READY STATUS RESTARTS AGE
aibrix-controller-manager-6489d5b587-gjndt 1/1 Running 0 39h
aibrix-gateway-plugins-58bdc89d9c-l4vv9 1/1 Running 0 3d2h
aibrix-gpu-optimizer-75df97858d-5hmbk 1/1 Running 0 3d2h
aibrix-kuberay-operator-55f5ddcbf4-b8d8z 1/1 Running 0 3d2h
aibrix-metadata-service-66f45c85bc-hsg6x 1/1 Running 0 3d2h
aibrix-redis-master-7bff9b56f5-hdt7h 1/1 Running 0 3d2h
(base) amd@tw043:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
deepseek-r1-distill-llama-8b-78874f6d48-qq295 1/1 Running 57 (18m ago) 4h31m
(base) amd@tw043:~$
(base) amd@tw043:~$ kubectl get svc -n envoy-gateway-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
envoy-aibrix-system-aibrix-eg-903790dc LoadBalancer 10.43.160.151 <pending> 80:32025/TCP 3d2h
envoy-gateway ClusterIP 10.43.73.183 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP 3d2h
(base) amd@tw043:~$
(base) amd@tw043:~$
(base) amd@tw043:~$ kubectl describe svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system
Name: envoy-aibrix-system-aibrix-eg-903790dc
Namespace: envoy-gateway-system
Labels: app.kubernetes.io/component=proxy
app.kubernetes.io/managed-by=envoy-gateway
app.kubernetes.io/name=envoy
gateway.envoyproxy.io/owning-gateway-name=aibrix-eg
gateway.envoyproxy.io/owning-gateway-namespace=aibrix-system
Annotations: <none>
Selector: app.kubernetes.io/component=proxy,app.kubernetes.io/managed-by=envoy-gateway,app.kubernetes.io/name=envoy,gateway.envoyproxy.io/owning-gateway-name=aibrix-eg,gateway.envoyproxy.io/owning-gateway-namespace=aibrix-system
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.160.151
IPs: 10.43.160.151
Port: http-80 80/TCP
TargetPort: 10080/TCP
NodePort: http-80 32025/TCP
Endpoints: 10.42.2.12:10080
Session Affinity: None
External Traffic Policy: Local
Internal Traffic Policy: Cluster
HealthCheck NodePort: 31308
Events: <none>
- Anything more need me to check the env?
@AlexHe99 can you manually change LoadBalancer type to NodePort type and use host IP + nodePort to continue the testing? Since this is on-prem cluster, I think the kubernetes distribution doesn't have service controller adapter for this vendor, result in failing to create the load balancer IP.
@AlexHe99 can you manually change
LoadBalancertype toNodePorttype and use host IP + nodePort to continue the testing? Since this is on-prem cluster, I think the kubernetes distribution doesn't have service controller adapter for this vendor, result in failing to create the load balancer IP.
I do change the svc type to NodePort and restart the deployment and service. But still failed. The steps with log are bellow.
Changed Service,
apiVersion: v1
kind: Service
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
prometheus-discovery: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
name: deepseek-r1-distill-llama-8b # Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
namespace: default
spec:
ports:
- name: serve
port: 8000
protocol: TCP
targetPort: 8000
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
model.aibrix.ai/name: deepseek-r1-distill-llama-8b
type: NodePort
First, do some checking,
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get svc deepseek-r1-distill-llama-8b
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
deepseek-r1-distill-llama-8b NodePort 10.43.222.109 <none> 8000:32752/TCP,8080:30261/TCP 8d
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pods -l model.aibrix.ai/name=deepseek-r1-distill-llama-8b
NAME READY STATUS RESTARTS AGE
deepseek-r1-distill-llama-8b-78874f6d48-85x6q 1/1 Running 0 77m
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl describe svc deepseek-r1-distill-llama-8b
Name: deepseek-r1-distill-llama-8b
Namespace: default
Labels: model.aibrix.ai/name=deepseek-r1-distill-llama-8b
prometheus-discovery=true
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Selector: model.aibrix.ai/name=deepseek-r1-distill-llama-8b
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.222.109
IPs: 10.43.222.109
Port: serve 8000/TCP
TargetPort: 8000/TCP
NodePort: serve 32752/TCP
Endpoints: 10.42.11.48:8000
Port: http 8080/TCP
TargetPort: 8080/TCP
NodePort: http 30261/TCP
Endpoints: 10.42.11.48:8080
Session Affinity: None
External Traffic Policy: Cluster
Internal Traffic Policy: Cluster
Events: <none>
Check the pod and ports are listened
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pods -l model.aibrix.ai/name=deepseek-r1-distill-llama-8b
NAME READY STATUS RESTARTS AGE
deepseek-r1-distill-llama-8b-78874f6d48-85x6q 1/1 Running 0 80m
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl exec -it deepseek-r1-distill-llama-8b-78874f6d48-85x6q -- netstat -tuln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 10.42.11.48:33377 0.0.0.0:* LISTEN
tcp 0 0 10.42.11.48:40921 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN
tcp 0 0 10.42.11.48:44293 0.0.0.0:* LISTEN
tcp 0 0 10.42.11.48:42363 0.0.0.0:* LISTEN
tcp6 0 0 :::43441 :::* LISTEN
Last access the svc by the NodePort and Internal IP
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pod -o wide | grep deepseek-r1-distill-llama-8b
deepseek-r1-distill-llama-8b-78874f6d48-85x6q 1/1 Running 0 84m 10.42.11.48 tw043 <none> <none>
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.43:32752
{"detail":"Not Found"}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.43:30261
curl: (7) Failed to connect to 10.21.9.43 port 30261 after 0 ms: Connection refused
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://deepseek-r1-distill-llama-8b.default.svc.cluster.local:8000
curl: (6) Could not resolve host: deepseek-r1-distill-llama-8b.default.svc.cluster.local
Any more suggestion?
@Jeffwan @varungup90 could you guys help on this issue? Thanks!
@AlexHe99 Please change service type for envoy-aibrix-system-aibrix-eg-903790dc from LoadBalancer to NodePort (@Jeffwan was referring to this service). You can revert back previous change made to model's service.
Another hacky or just for test purpose alternative is to do port forwarding as follows.
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &
For inference, you can do as follow
curl -v http://localhost:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-8b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
Please feel free to reach out on slack channel. :)
@AlexHe99 Please change service type for
envoy-aibrix-system-aibrix-eg-903790dcfrom LoadBalancer to NodePort (@Jeffwan was referring to this service). You can revert back previous change made to model's service.Another hacky or just for test purpose alternative is to do port forwarding as follows.
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &For inference, you can do as follow
curl -v http://localhost:8888/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-8b", "messages": [{"role": "user", "content": "Say this is a test!"}], "temperature": 0.7 }'Please feel free to reach out on slack channel. :)
@varungup90 Thank you guide. I do change the svc to NodePort and test it. But it still has issue. Please see the log.
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2025-04-21T03:30:52Z"
labels:
app.kubernetes.io/component: proxy
app.kubernetes.io/managed-by: envoy-gateway
app.kubernetes.io/name: envoy
gateway.envoyproxy.io/owning-gateway-name: aibrix-eg
gateway.envoyproxy.io/owning-gateway-namespace: aibrix-system
name: envoy-aibrix-system-aibrix-eg-903790dc
namespace: envoy-gateway-system
resourceVersion: "25034037"
uid: c05814fe-e779-4ad5-ace8-c0a889aed04b
spec:
clusterIP: 10.43.160.151
clusterIPs:
- 10.43.160.151
externalTrafficPolicy: Local
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http-80
nodePort: 32025
port: 80
protocol: TCP
targetPort: 10080
selector:
app.kubernetes.io/component: proxy
app.kubernetes.io/managed-by: envoy-gateway
app.kubernetes.io/name: envoy
gateway.envoyproxy.io/owning-gateway-name: aibrix-eg
gateway.envoyproxy.io/owning-gateway-namespace: aibrix-system
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get svc -n envoy-gateway-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
envoy-aibrix-system-aibrix-eg-903790dc NodePort 10.43.160.151 <none> 80:32025/TCP 16d
envoy-gateway ClusterIP 10.43.73.183 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP 16d
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ ^C
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl exec -it <pod-name> -n envoy-gateway-system -- netstat -tuln | grep 80
-bash: pod-name: No such file or directory
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
mia1-blade-02 Ready control-plane,etcd,master 66d v1.31.6+k3s1 10.21.9.102 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw004 Ready <none> 66d v1.31.6+k3s1 10.21.9.4 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw010 Ready <none> 66d v1.31.6+k3s1 10.21.9.10 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw013 NotReady <none> 66d v1.31.6+k3s1 10.21.9.13 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw014 Ready <none> 66d v1.31.6+k3s1 10.21.9.14 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw015 Ready <none> 66d v1.31.6+k3s1 10.21.9.15 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw020 Ready <none> 66d v1.31.6+k3s1 10.21.9.20 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw022 Ready <none> 66d v1.31.6+k3s1 10.21.9.22 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw023 Ready <none> 66d v1.31.6+k3s1 10.21.9.23 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw024 Ready <none> 66d v1.31.6+k3s1 10.21.9.24 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw033 Ready <none> 66d v1.31.6+k3s1 10.21.9.33 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw039 Ready <none> 66d v1.31.6+k3s1 10.21.9.39 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
tw043 Ready <none> 66d v1.31.6+k3s1 10.21.9.43 <none> Ubuntu 22.04.4 LTS 5.15.0-116-generic containerd://2.0.2-k3s2
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.4:32025
curl: (28) Failed to connect to 10.21.9.4 port 32025 after 130313 ms: Connection timed out
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.4:32025
curl: (28) Failed to connect to 10.21.9.4 port 32025 after 130221 ms: Connection timed out
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.10:32025
{"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.43:32025
{"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
Then I do some checking,
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ sudo ufw status
sudo iptables -L -n -v | grep 32025
Status: inactive
0 0 DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* envoy-gateway-system/envoy-aibrix-system-aibrix-eg-903790dc:http-80 has no local endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:32025
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pods -n kube-system | grep kube-proxy
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ ping 10.21.9.4
PING 10.21.9.4 (10.21.9.4) 56(84) bytes of data.
64 bytes from 10.21.9.4: icmp_seq=1 ttl=64 time=0.227 ms
64 bytes from 10.21.9.4: icmp_seq=2 ttl=64 time=0.169 ms
^C
--- 10.21.9.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1028ms
rtt min/avg/max/mdev = 0.169/0.198/0.227/0.029 ms
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ ping 10.21.9.10
PING 10.21.9.10 (10.21.9.10) 56(84) bytes of data.
64 bytes from 10.21.9.10: icmp_seq=1 ttl=64 time=0.270 ms
64 bytes from 10.21.9.10: icmp_seq=2 ttl=64 time=0.240 ms
^C
--- 10.21.9.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.240/0.255/0.270/0.015 ms
@varungup90
for Another hacky or just for test purpose alternative is to do port forwarding as follows.
It looks like working,
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl -v http://localhost:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-8b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
* Trying 127.0.0.1:8888...
* Connected to localhost (127.0.0.1) port 8888 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8888
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 147
>
Handling connection for 8888
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 08 May 2025 03:54:42 GMT
< server: uvicorn
< content-type: application/json
< x-went-into-req-headers: true
< transfer-encoding: chunked
<
{"id":"chatcmpl-1a4a3d60-8145-41a3-ae66-7879a5bd6e41","object":"chat.completion","created":1746676483,"model":"deepseek-r1-distill-llama-8b","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, so I'm trying to figure out how to respond to the user's message. They said, \"Say this is a test! Please reason step by step, and put your final answer within \\boxed{}.\" Hmm, that's a bit confusing. Let me break it down.\n\nFirst, they want me to respond with a test. But what exactly are they testing? Maybe they want me to show that I understand the instructions properly. The second part says to reason step by step and put the final answer in a box. So perhaps I need to demonstrate a thought process, and then present a concise answer in a box.\n\nWait, but the initial statement is a command: \"Say this is a test!\" So maybe they want me to acknowledge that it's a test and proceed accordingly. I should probably start by confirming that it's a test and then explain how I arrived at the answer, even if the answer is just acknowledging the test.\n\nLet me think about how to structure this. I'll start by restating the test, then explain each step as if I'm reasoning through it. But since the test seems to be a command, maybe the answer is simply that it's a test. However, I need to provide a detailed thought process as per their instructions.\n\nSo, step one: acknowledge the test. Step two: explain that I need to reason through it. Since the content is a bit vague, I'll assume that the test is to confirm that I can follow instructions, even if the actual answer is straightforward.\n\nPutting it all together, I'll write out my thought process, show that I understand the request, and then present the final answer in a box as they asked. I need to make sure each step is clear and follows logically from the previous one. Let me make sure I don't skip any steps and that my reasoning is sound.\n\nAlright, I think I have a plan. I'll start by stating that it's a test, then explain how I arri* Connection #0 to host localhost left intact
ved at the conclusion, and finally put the answer in a box. That should cover everything they asked for.\n</think>\n\nThe user's message is a test, instructing to provide a detailed thought process and a final answer in a box. \n\n1. **Acknowledgment of the Test**: The message is a test, confirming the ability to follow instructions.\n2. **Understanding the Request**: Recognizing the need to reason step-by-step and present the answer in a box.\n3. **Conclusion**: The final answer is that it is indeed a test.\n\n\\boxed{\\text{It is a test.}}","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":11,"total_tokens":534,"completion_tokens":523,"prompt_tokens_details":null},"prompt_logprobs":null}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
for
Another hacky or just for test purpose alternative is to do port forwarding as follows.It looks like working,
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl -v http://localhost:8888/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-8b", "messages": [{"role": "user", "content": "Say this is a test!"}], "temperature": 0.7 }' * Trying 127.0.0.1:8888... * Connected to localhost (127.0.0.1) port 8888 (#0) > POST /v1/chat/completions HTTP/1.1 > Host: localhost:8888 > User-Agent: curl/7.81.0 > Accept: */* > Content-Type: application/json > Content-Length: 147 > Handling connection for 8888 * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < date: Thu, 08 May 2025 03:54:42 GMT < server: uvicorn < content-type: application/json < x-went-into-req-headers: true < transfer-encoding: chunked < {"id":"chatcmpl-1a4a3d60-8145-41a3-ae66-7879a5bd6e41","object":"chat.completion","created":1746676483,"model":"deepseek-r1-distill-llama-8b","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, so I'm trying to figure out how to respond to the user's message. They said, \"Say this is a test! Please reason step by step, and put your final answer within \\boxed{}.\" Hmm, that's a bit confusing. Let me break it down.\n\nFirst, they want me to respond with a test. But what exactly are they testing? Maybe they want me to show that I understand the instructions properly. The second part says to reason step by step and put the final answer in a box. So perhaps I need to demonstrate a thought process, and then present a concise answer in a box.\n\nWait, but the initial statement is a command: \"Say this is a test!\" So maybe they want me to acknowledge that it's a test and proceed accordingly. I should probably start by confirming that it's a test and then explain how I arrived at the answer, even if the answer is just acknowledging the test.\n\nLet me think about how to structure this. I'll start by restating the test, then explain each step as if I'm reasoning through it. But since the test seems to be a command, maybe the answer is simply that it's a test. However, I need to provide a detailed thought process as per their instructions.\n\nSo, step one: acknowledge the test. Step two: explain that I need to reason through it. Since the content is a bit vague, I'll assume that the test is to confirm that I can follow instructions, even if the actual answer is straightforward.\n\nPutting it all together, I'll write out my thought process, show that I understand the request, and then present the final answer in a box as they asked. I need to make sure each step is clear and follows logically from the previous one. Let me make sure I don't skip any steps and that my reasoning is sound.\n\nAlright, I think I have a plan. I'll start by stating that it's a test, then explain how I arri* Connection #0 to host localhost left intact ved at the conclusion, and finally put the answer in a box. That should cover everything they asked for.\n</think>\n\nThe user's message is a test, instructing to provide a detailed thought process and a final answer in a box. \n\n1. **Acknowledgment of the Test**: The message is a test, confirming the ability to follow instructions.\n2. **Understanding the Request**: Recognizing the need to reason step-by-step and present the answer in a box.\n3. **Conclusion**: The final answer is that it is indeed a test.\n\n\\boxed{\\text{It is a test.}}","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":11,"total_tokens":534,"completion_tokens":523,"prompt_tokens_details":null},"prompt_logprobs":null}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
I encountered the same problem as you, but I did port mapping in time and it still didn't work. Error 500
@AlexHe99 Just wanted to check how testing is going, feel free to raise any issues encountered.
@lgy1027 From other issue, /v1/chat/completions works as expected, and now you are trying rate limit feature.
@AlexHe99 Just wanted to check how testing is going, feel free to raise any issues encountered.
@lgy1027 From other issue, /v1/chat/completions works as expected, and now you are trying rate limit feature.
@varungup90 Thank you guide. I do change the svc to NodePort and test it. But it still has issue. Please see the log above.
By co-debugging with @Jeffwan at field in KubeCon_HK_2025, we confirm the LB is not available on my cluster. And the NodePort with Internal IP is working now.
Thank the strong kindly support from @Jeffwan @varungup90 @xieus @lgy1027 .
Below the verify steps with log.
Find out the gateway on which node,
(base) amd@tw043:~/alehe/aibrix/samples-rocm$ kubectl get pods -n envoy-gateway-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
envoy-aibrix-system-aibrix-eg-903790dc-69d6c84d68-z6pkl 2/2 Running 0 33m 10.42.2.103 tw010 <none> <none>
envoy-gateway-7c7659ffc9-fnj95 1/1 Running 0 34m 10.42.9.14 tw033 <none> <none>
Now I got the node IP and port: 10.21.9.10:30947 and test the LLM service by IP:PORT
(base) amd@tw043:~/alehe/aibrix/samples-rocm$ curl http://10.21.9.10:30947/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-r1-distill-llama-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "help me write a random generator in python"}
]
}'
{"id":"chatcmpl-e799c093-cc3b-4122-9d8b-3998fa98f475","object":"chat.completion","created":1749622905,"model":"deepseek-r1-distill-llama-8b","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, the user wants help writing a random generator in Python. I should figure out what kind of generator they're looking for. Maybe they want something simple, like numbers, or perhaps more complex like names or entire lists.\n\nI'll start by asking them to clarify their needs. That way, I can provide a more tailored solution. I should make sure to keep my response friendly and helpful.\n</think>\n\nSure! Could you please specify what kind of random generator you'd like to create? For example:\n\n- A random number generator (e.g., between 1 and 100)?\n- A random name generator?\n- A random list of items?\n- Something else?\n\nLet me know, and I'll help you create it!","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":19,"total_tokens":166,"completion_tokens":147,"prompt_tokens_details":null},"prompt_logprobs":null}
Another hacky or just for test purpose alternative is to do port forwarding as follows.