aibrix Empty LB_IP when try Quickstart for AMD ROCm Cluster

I am trying to run Quickstart — AIBrix on a Cluster of AMD MI300X ROCm platform.

Steps ( follow the instruction of the Quickstart)

Installl AIBrix and check the pods,

$ kubectl get pods -n aibrix-system
NAME                                         READY   STATUS    RESTARTS   AGE
aibrix-controller-manager-6489d5b587-k2szh   1/1     Running   0          30h
aibrix-gateway-plugins-58bdc89d9c-l4vv9      1/1     Running   0          30h
aibrix-gpu-optimizer-75df97858d-5hmbk        1/1     Running   0          30h
aibrix-kuberay-operator-55f5ddcbf4-b8d8z     1/1     Running   0          30h
aibrix-metadata-service-66f45c85bc-hsg6x     1/1     Running   0          30h
aibrix-redis-master-7bff9b56f5-hdt7h         1/1     Running   0          30h

Deploy base model Refer to model.yaml of the Quickstart to create the model.yaml for ROCm. Just two parts modified for ROCm,

use rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325 which is the vllm/rocm docker image
use amd.com/gpu in the resource of the container

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b # Note: The label value `model.aibrix.ai/name` here must match with the service name.
    model.aibrix.ai/port: "8000"
  name: deepseek-r1-distill-llama-8b
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      model.aibrix.ai/name: deepseek-r1-distill-llama-8b
  template:
    metadata:
      labels:
        model.aibrix.ai/name: deepseek-r1-distill-llama-8b
    spec:
      containers:
        - command:
            - python3
            - -m
            - vllm.entrypoints.openai.api_server
            - --host
            - "0.0.0.0"
            - --port
            - "8000"
            - --uvicorn-log-level
            - warning
            - --model
            - deepseek-ai/DeepSeek-R1-Distill-Llama-8B
            - --served-model-name
            # Note: The `--served-model-name` argument value must also match the Service name and the Deployment label `model.aibrix.ai/name`
            - deepseek-r1-distill-llama-8b
            - --max-model-len
            - "12288" # 24k length, this is to avoid "The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache" issue.
          #image: vllm/vllm-openai:v0.7.1
          image: rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325
          imagePullPolicy: IfNotPresent
          name: vllm-openai
          securityContext:
            seccompProfile:
              type: Unconfined
            runAsGroup: 44
            capabilities:
              add:
              - SYS_PTRACE
          ports:
            - containerPort: 8000
              protocol: TCP
          resources:
            limits:
              amd.com/gpu: "1"
            requests:
              amd.com/gpu: "1"
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
              scheme: HTTP
            failureThreshold: 3
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
              scheme: HTTP
            failureThreshold: 5
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          startupProbe:
            httpGet:
              path: /health
              port: 8000
              scheme: HTTP
            failureThreshold: 30
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1

---

apiVersion: v1
kind: Service
metadata:
  labels:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b
    prometheus-discovery: "true"
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
  name: deepseek-r1-distill-llama-8b # Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
  namespace: default
spec:
  ports:
    - name: serve
      port: 8000
      protocol: TCP
      targetPort: 8000
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b
  type: ClusterIP

Make sure the vLLM service is working fine after run run kubectl apply -f model.yaml

(base) amd@tw043:~$ kubectl get pods
NAME                                            READY   STATUS    RESTARTS   AGE
deepseek-r1-distill-llama-8b-78874f6d48-t8gbp   1/1     Running   0          21h

(base) amd@tw043:~$ kubectl get svc
NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
deepseek-r1-distill-llama-8b   ClusterIP   10.43.222.109   <none>        8000/TCP,8080/TCP   23h
kubernetes                     ClusterIP   10.43.0.1       <none>        443/TCP             51d
(base) amd@tw043:~$ curl http://10.43.222.109:8000/v1/models
{"object":"list","data":[{"id":"deepseek-r1-distill-llama-8b","object":"model","created":1745305150,"owned_by":"vllm","root":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","parent":null,"max_model_len":12288,"permission":[{"id":"modelperm-2559299ffa2c4d29b2b6ce2b6a8ba6ee","object":"model_permission","created":1745305150,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}(base)

Invoke the model endpoint using gateway api

# Option 1: Kubernetes cluster with LoadBalancer support
LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
ENDPOINT="${LB_IP}:80"

But the LB_IP is empty check by ,

(base) amd@tw043:~$ LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
(base) amd@tw043:~$
(base) amd@tw043:~$ ENDPOINT="${LB_IP}:80"
(base) amd@tw043:~$
(base) amd@tw043:~$ echo $ENDPOINT
:80

Then check the service of envoy-gateway-system. The PORTS are and mean they are abnormal.

(base) amd@tw043:~$ kubectl get svc -n envoy-gateway-system
NAME                                     TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                   AGE
envoy-aibrix-system-aibrix-eg-903790dc   LoadBalancer   10.43.160.151   <pending>     80:32025/TCP                              29h
envoy-gateway                            ClusterIP      10.43.73.183    <none>        18000/TCP,18001/TCP,18002/TCP,19001/TCP   29h

Apr 23 '25 12:04 AlexHe99

what's your kubernetes offering? are you on public cloud or on-prem cluster?
could you describe kubectl describe svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system to check the service pending information? Seems your service-controller can not create the service successfully

Apr 23 '25 17:04 Jeffwan

what's your kubernetes offering? are you on public cloud or on-prem cluster?

could you describe kubectl describe svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system to check the service pending information? Seems your service-controller can not create the service successfully

Hi Jeffwan,

Thank for your response.

It is a small cluster of private cloud solution operated by our vendor.
See the details

(base) amd@tw043:~$ kubectl get pods
NAME                                            READY   STATUS    RESTARTS       AGE
deepseek-r1-distill-llama-8b-78874f6d48-qq295   1/1     Running   57 (17m ago)   4h31m
(base) amd@tw043:~$ kubectl get pods -n aibrix-system
NAME                                         READY   STATUS    RESTARTS   AGE
aibrix-controller-manager-6489d5b587-gjndt   1/1     Running   0          39h
aibrix-gateway-plugins-58bdc89d9c-l4vv9      1/1     Running   0          3d2h
aibrix-gpu-optimizer-75df97858d-5hmbk        1/1     Running   0          3d2h
aibrix-kuberay-operator-55f5ddcbf4-b8d8z     1/1     Running   0          3d2h
aibrix-metadata-service-66f45c85bc-hsg6x     1/1     Running   0          3d2h
aibrix-redis-master-7bff9b56f5-hdt7h         1/1     Running   0          3d2h
(base) amd@tw043:~$ kubectl get pods
NAME                                            READY   STATUS    RESTARTS       AGE
deepseek-r1-distill-llama-8b-78874f6d48-qq295   1/1     Running   57 (18m ago)   4h31m
(base) amd@tw043:~$
(base) amd@tw043:~$ kubectl get svc -n envoy-gateway-system
NAME                                     TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                   AGE
envoy-aibrix-system-aibrix-eg-903790dc   LoadBalancer   10.43.160.151   <pending>     80:32025/TCP                              3d2h
envoy-gateway                            ClusterIP      10.43.73.183    <none>        18000/TCP,18001/TCP,18002/TCP,19001/TCP   3d2h
(base) amd@tw043:~$
(base) amd@tw043:~$
(base) amd@tw043:~$ kubectl describe svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system
Name:                     envoy-aibrix-system-aibrix-eg-903790dc
Namespace:                envoy-gateway-system
Labels:                   app.kubernetes.io/component=proxy
                          app.kubernetes.io/managed-by=envoy-gateway
                          app.kubernetes.io/name=envoy
                          gateway.envoyproxy.io/owning-gateway-name=aibrix-eg
                          gateway.envoyproxy.io/owning-gateway-namespace=aibrix-system
Annotations:              <none>
Selector:                 app.kubernetes.io/component=proxy,app.kubernetes.io/managed-by=envoy-gateway,app.kubernetes.io/name=envoy,gateway.envoyproxy.io/owning-gateway-name=aibrix-eg,gateway.envoyproxy.io/owning-gateway-namespace=aibrix-system
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.160.151
IPs:                      10.43.160.151
Port:                     http-80  80/TCP
TargetPort:               10080/TCP
NodePort:                 http-80  32025/TCP
Endpoints:                10.42.2.12:10080
Session Affinity:         None
External Traffic Policy:  Local
Internal Traffic Policy:  Cluster
HealthCheck NodePort:     31308
Events:                   <none>

Anything more need me to check the env?

Apr 24 '25 06:04 AlexHe99

@AlexHe99 can you manually change LoadBalancer type to NodePort type and use host IP + nodePort to continue the testing? Since this is on-prem cluster, I think the kubernetes distribution doesn't have service controller adapter for this vendor, result in failing to create the load balancer IP.

Apr 25 '25 00:04 Jeffwan

@AlexHe99 can you manually change LoadBalancer type to NodePort type and use host IP + nodePort to continue the testing? Since this is on-prem cluster, I think the kubernetes distribution doesn't have service controller adapter for this vendor, result in failing to create the load balancer IP.

I do change the svc type to NodePort and restart the deployment and service. But still failed. The steps with log are bellow.

Changed Service,

apiVersion: v1
kind: Service
metadata:
  labels:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b
    prometheus-discovery: "true"
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
  name: deepseek-r1-distill-llama-8b # Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
  namespace: default
spec:
  ports:
    - name: serve
      port: 8000
      protocol: TCP
      targetPort: 8000
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    model.aibrix.ai/name: deepseek-r1-distill-llama-8b
  type: NodePort

First, do some checking,

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get svc deepseek-r1-distill-llama-8b
NAME                           TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
deepseek-r1-distill-llama-8b   NodePort   10.43.222.109   <none>        8000:32752/TCP,8080:30261/TCP   8d


(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pods -l model.aibrix.ai/name=deepseek-r1-distill-llama-8b
NAME                                            READY   STATUS    RESTARTS   AGE
deepseek-r1-distill-llama-8b-78874f6d48-85x6q   1/1     Running   0          77m

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl describe svc deepseek-r1-distill-llama-8b
Name:                     deepseek-r1-distill-llama-8b
Namespace:                default
Labels:                   model.aibrix.ai/name=deepseek-r1-distill-llama-8b
                          prometheus-discovery=true
Annotations:              prometheus.io/port: 8080
                          prometheus.io/scrape: true
Selector:                 model.aibrix.ai/name=deepseek-r1-distill-llama-8b
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.222.109
IPs:                      10.43.222.109
Port:                     serve  8000/TCP
TargetPort:               8000/TCP
NodePort:                 serve  32752/TCP
Endpoints:                10.42.11.48:8000
Port:                     http  8080/TCP
TargetPort:               8080/TCP
NodePort:                 http  30261/TCP
Endpoints:                10.42.11.48:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Internal Traffic Policy:  Cluster
Events:                   <none>

Check the pod and ports are listened

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pods -l model.aibrix.ai/name=deepseek-r1-distill-llama-8b
NAME                                            READY   STATUS    RESTARTS   AGE
deepseek-r1-distill-llama-8b-78874f6d48-85x6q   1/1     Running   0          80m
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl exec -it deepseek-r1-distill-llama-8b-78874f6d48-85x6q -- netstat -tuln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 10.42.11.48:33377       0.0.0.0:*               LISTEN
tcp        0      0 10.42.11.48:40921       0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN
tcp        0      0 10.42.11.48:44293       0.0.0.0:*               LISTEN
tcp        0      0 10.42.11.48:42363       0.0.0.0:*               LISTEN
tcp6       0      0 :::43441                :::*                    LISTEN

Last access the svc by the NodePort and Internal IP

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pod -o wide | grep deepseek-r1-distill-llama-8b
deepseek-r1-distill-llama-8b-78874f6d48-85x6q   1/1     Running   0          84m   10.42.11.48   tw043   <none>           <none>
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.43:32752
{"detail":"Not Found"}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.43:30261
curl: (7) Failed to connect to 10.21.9.43 port 30261 after 0 ms: Connection refused
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://deepseek-r1-distill-llama-8b.default.svc.cluster.local:8000
curl: (6) Could not resolve host: deepseek-r1-distill-llama-8b.default.svc.cluster.local

Any more suggestion?

Apr 30 '25 06:04 AlexHe99

@Jeffwan @varungup90 could you guys help on this issue? Thanks!

May 01 '25 17:05 xieus

@AlexHe99 Please change service type for envoy-aibrix-system-aibrix-eg-903790dc from LoadBalancer to NodePort (@Jeffwan was referring to this service). You can revert back previous change made to model's service.

Another hacky or just for test purpose alternative is to do port forwarding as follows.

kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc  8888:80 &

For inference, you can do as follow

curl -v http://localhost:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "deepseek-r1-distill-llama-8b",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

Please feel free to reach out on slack channel. :)

May 01 '25 18:05 varungup90

@AlexHe99 Please change service type for envoy-aibrix-system-aibrix-eg-903790dc from LoadBalancer to NodePort (@Jeffwan was referring to this service). You can revert back previous change made to model's service.
Another hacky or just for test purpose alternative is to do port forwarding as follows.
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc  8888:80 &
For inference, you can do as follow
curl -v http://localhost:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "deepseek-r1-distill-llama-8b",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
Please feel free to reach out on slack channel. :)

@varungup90 Thank you guide. I do change the svc to NodePort and test it. But it still has issue. Please see the log.

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get svc envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2025-04-21T03:30:52Z"
  labels:
    app.kubernetes.io/component: proxy
    app.kubernetes.io/managed-by: envoy-gateway
    app.kubernetes.io/name: envoy
    gateway.envoyproxy.io/owning-gateway-name: aibrix-eg
    gateway.envoyproxy.io/owning-gateway-namespace: aibrix-system
  name: envoy-aibrix-system-aibrix-eg-903790dc
  namespace: envoy-gateway-system
  resourceVersion: "25034037"
  uid: c05814fe-e779-4ad5-ace8-c0a889aed04b
spec:
  clusterIP: 10.43.160.151
  clusterIPs:
  - 10.43.160.151
  externalTrafficPolicy: Local
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-80
    nodePort: 32025
    port: 80
    protocol: TCP
    targetPort: 10080
  selector:
    app.kubernetes.io/component: proxy
    app.kubernetes.io/managed-by: envoy-gateway
    app.kubernetes.io/name: envoy
    gateway.envoyproxy.io/owning-gateway-name: aibrix-eg
    gateway.envoyproxy.io/owning-gateway-namespace: aibrix-system
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get svc -n envoy-gateway-system
NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                   AGE
envoy-aibrix-system-aibrix-eg-903790dc   NodePort    10.43.160.151   <none>        80:32025/TCP                              16d
envoy-gateway                            ClusterIP   10.43.73.183    <none>        18000/TCP,18001/TCP,18002/TCP,19001/TCP   16d
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ ^C
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl exec -it <pod-name> -n envoy-gateway-system -- netstat -tuln | grep 80
-bash: pod-name: No such file or directory
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get nodes -o wide
NAME            STATUS     ROLES                       AGE   VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
mia1-blade-02   Ready      control-plane,etcd,master   66d   v1.31.6+k3s1   10.21.9.102   <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw004           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.4     <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw010           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.10    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw013           NotReady   <none>                      66d   v1.31.6+k3s1   10.21.9.13    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw014           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.14    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw015           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.15    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw020           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.20    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw022           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.22    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw023           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.23    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw024           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.24    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw033           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.33    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw039           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.39    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
tw043           Ready      <none>                      66d   v1.31.6+k3s1   10.21.9.43    <none>        Ubuntu 22.04.4 LTS   5.15.0-116-generic   containerd://2.0.2-k3s2
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.4:32025


curl: (28) Failed to connect to 10.21.9.4 port 32025 after 130313 ms: Connection timed out
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.4:32025
curl: (28) Failed to connect to 10.21.9.4 port 32025 after 130221 ms: Connection timed out
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.10:32025
{"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl http://10.21.9.43:32025
{"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$

Then I do some checking,

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ sudo ufw status
sudo iptables -L -n -v | grep 32025
Status: inactive
    0     0 DROP       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* envoy-gateway-system/envoy-aibrix-system-aibrix-eg-903790dc:http-80 has no local endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:32025
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ kubectl get pods -n kube-system | grep kube-proxy
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ ping 10.21.9.4
PING 10.21.9.4 (10.21.9.4) 56(84) bytes of data.
64 bytes from 10.21.9.4: icmp_seq=1 ttl=64 time=0.227 ms
64 bytes from 10.21.9.4: icmp_seq=2 ttl=64 time=0.169 ms
^C
--- 10.21.9.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1028ms
rtt min/avg/max/mdev = 0.169/0.198/0.227/0.029 ms
(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ ping 10.21.9.10
PING 10.21.9.10 (10.21.9.10) 56(84) bytes of data.
64 bytes from 10.21.9.10: icmp_seq=1 ttl=64 time=0.270 ms
64 bytes from 10.21.9.10: icmp_seq=2 ttl=64 time=0.240 ms
^C
--- 10.21.9.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.240/0.255/0.270/0.015 ms

May 08 '25 03:05 AlexHe99

@varungup90

for Another hacky or just for test purpose alternative is to do port forwarding as follows.

It looks like working,

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl -v http://localhost:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "deepseek-r1-distill-llama-8b",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
*   Trying 127.0.0.1:8888...
* Connected to localhost (127.0.0.1) port 8888 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8888
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 147
>
Handling connection for 8888
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 08 May 2025 03:54:42 GMT
< server: uvicorn
< content-type: application/json
< x-went-into-req-headers: true
< transfer-encoding: chunked
<
{"id":"chatcmpl-1a4a3d60-8145-41a3-ae66-7879a5bd6e41","object":"chat.completion","created":1746676483,"model":"deepseek-r1-distill-llama-8b","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, so I'm trying to figure out how to respond to the user's message. They said, \"Say this is a test! Please reason step by step, and put your final answer within \\boxed{}.\" Hmm, that's a bit confusing. Let me break it down.\n\nFirst, they want me to respond with a test. But what exactly are they testing? Maybe they want me to show that I understand the instructions properly. The second part says to reason step by step and put the final answer in a box. So perhaps I need to demonstrate a thought process, and then present a concise answer in a box.\n\nWait, but the initial statement is a command: \"Say this is a test!\" So maybe they want me to acknowledge that it's a test and proceed accordingly. I should probably start by confirming that it's a test and then explain how I arrived at the answer, even if the answer is just acknowledging the test.\n\nLet me think about how to structure this. I'll start by restating the test, then explain each step as if I'm reasoning through it. But since the test seems to be a command, maybe the answer is simply that it's a test. However, I need to provide a detailed thought process as per their instructions.\n\nSo, step one: acknowledge the test. Step two: explain that I need to reason through it. Since the content is a bit vague, I'll assume that the test is to confirm that I can follow instructions, even if the actual answer is straightforward.\n\nPutting it all together, I'll write out my thought process, show that I understand the request, and then present the final answer in a box as they asked. I need to make sure each step is clear and follows logically from the previous one. Let me make sure I don't skip any steps and that my reasoning is sound.\n\nAlright, I think I have a plan. I'll start by stating that it's a test, then explain how I arri* Connection #0 to host localhost left intact
ved at the conclusion, and finally put the answer in a box. That should cover everything they asked for.\n</think>\n\nThe user's message is a test, instructing to provide a detailed thought process and a final answer in a box. \n\n1. **Acknowledgment of the Test**: The message is a test, confirming the ability to follow instructions.\n2. **Understanding the Request**: Recognizing the need to reason step-by-step and present the answer in a box.\n3. **Conclusion**: The final answer is that it is indeed a test.\n\n\\boxed{\\text{It is a test.}}","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":11,"total_tokens":534,"completion_tokens":523,"prompt_tokens_details":null},"prompt_logprobs":null}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$

May 08 '25 03:05 AlexHe99

@varungup90

for Another hacky or just for test purpose alternative is to do port forwarding as follows.

It looks like working,

(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$ curl -v http://localhost:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "deepseek-r1-distill-llama-8b",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
*   Trying 127.0.0.1:8888...
* Connected to localhost (127.0.0.1) port 8888 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8888
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 147
>
Handling connection for 8888
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Thu, 08 May 2025 03:54:42 GMT
< server: uvicorn
< content-type: application/json
< x-went-into-req-headers: true
< transfer-encoding: chunked
<
{"id":"chatcmpl-1a4a3d60-8145-41a3-ae66-7879a5bd6e41","object":"chat.completion","created":1746676483,"model":"deepseek-r1-distill-llama-8b","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, so I'm trying to figure out how to respond to the user's message. They said, \"Say this is a test! Please reason step by step, and put your final answer within \\boxed{}.\" Hmm, that's a bit confusing. Let me break it down.\n\nFirst, they want me to respond with a test. But what exactly are they testing? Maybe they want me to show that I understand the instructions properly. The second part says to reason step by step and put the final answer in a box. So perhaps I need to demonstrate a thought process, and then present a concise answer in a box.\n\nWait, but the initial statement is a command: \"Say this is a test!\" So maybe they want me to acknowledge that it's a test and proceed accordingly. I should probably start by confirming that it's a test and then explain how I arrived at the answer, even if the answer is just acknowledging the test.\n\nLet me think about how to structure this. I'll start by restating the test, then explain each step as if I'm reasoning through it. But since the test seems to be a command, maybe the answer is simply that it's a test. However, I need to provide a detailed thought process as per their instructions.\n\nSo, step one: acknowledge the test. Step two: explain that I need to reason through it. Since the content is a bit vague, I'll assume that the test is to confirm that I can follow instructions, even if the actual answer is straightforward.\n\nPutting it all together, I'll write out my thought process, show that I understand the request, and then present the final answer in a box as they asked. I need to make sure each step is clear and follows logically from the previous one. Let me make sure I don't skip any steps and that my reasoning is sound.\n\nAlright, I think I have a plan. I'll start by stating that it's a test, then explain how I arri* Connection #0 to host localhost left intact
ved at the conclusion, and finally put the answer in a box. That should cover everything they asked for.\n</think>\n\nThe user's message is a test, instructing to provide a detailed thought process and a final answer in a box. \n\n1. **Acknowledgment of the Test**: The message is a test, confirming the ability to follow instructions.\n2. **Understanding the Request**: Recognizing the need to reason step-by-step and present the answer in a box.\n3. **Conclusion**: The final answer is that it is indeed a test.\n\n\\boxed{\\text{It is a test.}}","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":11,"total_tokens":534,"completion_tokens":523,"prompt_tokens_details":null},"prompt_logprobs":null}(base) amd@tw043:~/alehe/aibrix/samples-rocm/quickstart$

I encountered the same problem as you, but I did port mapping in time and it still didn't work. Error 500

May 09 '25 03:05 lgy1027

@AlexHe99 Just wanted to check how testing is going, feel free to raise any issues encountered.

@lgy1027 From other issue, /v1/chat/completions works as expected, and now you are trying rate limit feature.

May 14 '25 05:05 varungup90

@AlexHe99 Just wanted to check how testing is going, feel free to raise any issues encountered.

@lgy1027 From other issue, /v1/chat/completions works as expected, and now you are trying rate limit feature.

@varungup90 Thank you guide. I do change the svc to NodePort and test it. But it still has issue. Please see the log above.

May 14 '25 05:05 AlexHe99

By co-debugging with @Jeffwan at field in KubeCon_HK_2025, we confirm the LB is not available on my cluster. And the NodePort with Internal IP is working now.

Thank the strong kindly support from @Jeffwan @varungup90 @xieus @lgy1027 .

Below the verify steps with log.

Find out the gateway on which node,

(base) amd@tw043:~/alehe/aibrix/samples-rocm$ kubectl get pods -n envoy-gateway-system -o wide
NAME                                                      READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
envoy-aibrix-system-aibrix-eg-903790dc-69d6c84d68-z6pkl   2/2     Running   0          33m   10.42.2.103   tw010   <none>           <none>
envoy-gateway-7c7659ffc9-fnj95                            1/1     Running   0          34m   10.42.9.14    tw033   <none>           <none>

Now I got the node IP and port: 10.21.9.10:30947 and test the LLM service by IP:PORT

(base) amd@tw043:~/alehe/aibrix/samples-rocm$ curl http://10.21.9.10:30947/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "deepseek-r1-distill-llama-8b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "help me write a random generator in python"}
    ]
}'
{"id":"chatcmpl-e799c093-cc3b-4122-9d8b-3998fa98f475","object":"chat.completion","created":1749622905,"model":"deepseek-r1-distill-llama-8b","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, the user wants help writing a random generator in Python. I should figure out what kind of generator they're looking for. Maybe they want something simple, like numbers, or perhaps more complex like names or entire lists.\n\nI'll start by asking them to clarify their needs. That way, I can provide a more tailored solution. I should make sure to keep my response friendly and helpful.\n</think>\n\nSure! Could you please specify what kind of random generator you'd like to create? For example:\n\n- A random number generator (e.g., between 1 and 100)?\n- A random name generator?\n- A random list of items?\n- Something else?\n\nLet me know, and I'll help you create it!","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":19,"total_tokens":166,"completion_tokens":147,"prompt_tokens_details":null},"prompt_logprobs":null}

Jun 11 '25 06:06 AlexHe99