data-on-eks Error while running RayServe with vLLM blueprint template

Error while running RayServe with vLLM blueprint template

Open Gall-oDrone opened this issue 7 months ago • 1 comments

Description

Generally speaking, got an error while trying to run any GenAI blueprint template. For this particular use case, the RayServe GPU node group is not been deployed as per example shows. The following error depicts after debugging Node group: Failed to schedule pod, incompatible with nodepool "x86-cpu-karpenter", daemonset overhead={"cpu":"210m","memory":"240Mi","pods":"6"}

Versions

Module version [Required]: AIML/Jark-Stack
Terraform version: Terraform v1.8.4 on linux_amd64

Provider version(s): Terraform v1.8.4 on linux_amd64

Reproduction Code [Required]

Steps to reproduce the behavior:

Yes Yes

git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/ai-ml/jark-stack/terraform && chmod +x install.sh
./install.sh
aws eks --region us-west-2 update-kubeconfig --name jark-stack
export HUGGING_FACE_HUB_TOKEN=$(echo -n "Your-Hugging-Face-Hub-Token-Value" | base64)
cd data-on-eks/gen-ai/inference/vllm-rayserve-gpu
envsubst < ray-service-vllm.yaml| kubectl apply -f -
kubectl get pod -n rayserve-vllm

Expected behavior

kubectl get pod -n rayserve-vllm should show the following:

NAME                                           READY   STATUS    RESTARTS   AGE
vllm-raycluster-nvtxg-head-g2cg8               1/1     Running   0          47m
vllm-raycluster-nvtxg-worker-gpu-group-msl5p   1/1     Running   0          47m

Actual behavior

kubectl get pod -n rayserve-vllm shows the following:

NAME                                           READY   STATUS    RESTARTS   AGE
vllm-raycluster-r66w9-head-zkt8k               2/2     Running   0          30m
vllm-raycluster-r66w9-worker-gpu-group-v72wh   0/1     Pending   0          30m

Terminal Output Screenshot(s)

kubectl describe pod vllm-raycluster-r66w9-worker-gpu-group-v72wh -n rayserve-vllm
Name:             vllm-raycluster-r66w9-worker-gpu-group-v72wh
Namespace:        rayserve-vllm
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app.kubernetes.io/created-by=kuberay-operator
                  app.kubernetes.io/name=kuberay
                  ray.io/cluster=vllm-raycluster-r66w9
                  ray.io/group=gpu-group
                  ray.io/identifier=vllm-raycluster-r66w9-worker
                  ray.io/is-ray-node=yes
                  ray.io/node-type=worker
                  ray.io/serve=true
Annotations:      ray.io/ft-enabled: false
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    RayCluster/vllm-raycluster-r66w9
Init Containers:
  wait-gcs-ready:
    Image:      public.ecr.aws/data-on-eks/ray2.24.0-py310-vllm-gpu:v1
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -lc
      --
    Args:
      
                SECONDS=0
                while true; do
                  if (( SECONDS <= 120 )); then
                    if ray health-check --address vllm.rayserve-vllm.svc.cluster.local:6379 > /dev/null 2>&1; then
                      echo "GCS is ready."
                      break
                    fi
                    echo "$SECONDS seconds elapsed: Waiting for GCS to be ready."
                  else
                    if ray health-check --address vllm.rayserve-vllm.svc.cluster.local:6379; then
                      echo "GCS is ready. Any error messages above can be safely ignored."
                      break
                    fi
                    echo "$SECONDS seconds elapsed: Still waiting for GCS to be ready. For troubleshooting, refer to the FAQ at https://github.com/ray-project/kuberay/blob/master/docs/guidance/FAQ.md."
                  fi
                  sleep 5    
                done
              
    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:     200m
      memory:  256Mi
    Environment:
      VLLM_PORT:               8004
      LD_LIBRARY_PATH:         /home/ray/anaconda3/lib:
      HUGGING_FACE_HUB_TOKEN:  <set to the key 'hf-token' in secret 'hf-token'>  Optional: false
      FQ_RAY_IP:               vllm.rayserve-vllm.svc.cluster.local
      RAY_IP:                  vllm
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhjsr (ro)
Containers:
  ray-worker:
    Image:      public.ecr.aws/data-on-eks/ray2.24.0-py310-vllm-gpu:v1
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /bin/bash
      -lc
      --
    Args:
      ulimit -n 65536; ray start  --address=vllm.rayserve-vllm.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=10  --memory=60000000000  --num-gpus=1 
    Limits:
      cpu:             10
      memory:          60G
      nvidia.com/gpu:  1
    Requests:
      cpu:             10
      memory:          60G
      nvidia.com/gpu:  1
    Liveness:          exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success] delay=30s timeout=1s period=5s #success=1 #failure=120
    Readiness:         exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success && wget -T 2 -q -O- http://localhost:8000/-/healthz | grep success] delay=10s timeout=1s period=5s #success=1 #failure=1
    Environment:
      VLLM_PORT:                                8004
      LD_LIBRARY_PATH:                          /home/ray/anaconda3/lib:
      HUGGING_FACE_HUB_TOKEN:                   <set to the key 'hf-token' in secret 'hf-token'>  Optional: false
      FQ_RAY_IP:                                vllm.rayserve-vllm.svc.cluster.local
      RAY_IP:                                   vllm
      RAY_CLUSTER_NAME:                          (v1:metadata.labels['ray.io/cluster'])
      RAY_CLOUD_INSTANCE_ID:                    vllm-raycluster-r66w9-worker-gpu-group-v72wh (v1:metadata.name)
      RAY_NODE_TYPE_NAME:                        (v1:metadata.labels['ray.io/group'])
      KUBERAY_GEN_RAY_START_CMD:                ray start  --address=vllm.rayserve-vllm.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=10  --memory=60000000000  --num-gpus=1 
      RAY_PORT:                                 6379
      RAY_timeout_ms_task_wait_for_death_info:  0
      RAY_gcs_server_request_timeout_seconds:   5
      RAY_SERVE_KV_TIMEOUT_S:                   5
      RAY_ADDRESS:                              vllm.rayserve-vllm.svc.cluster.local:6379
      RAY_USAGE_STATS_KUBERAY_IN_USE:           1
      REDIS_PASSWORD:                           
      RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE:      1
    Mounts:
      /dev/shm from shared-mem (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhjsr (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  shared-mem:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  60G
  kube-api-access-jhjsr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              NodeGroupType=g5-gpu-karpenter
                             type=karpenter
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                             nvidia.com/gpu:NoSchedule op=Exists
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  31m                 default-scheduler  0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   Nominated         31m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-fc4pv
  Normal   Nominated         28m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-jnzd8
  Normal   Nominated         25m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-w9g6f
  Normal   Nominated         22m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-ldz6d
  Normal   Nominated         19m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-f8ssz
  Normal   Nominated         15m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-8n9vv
  Normal   Nominated         12m                 karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-h2wrf
  Normal   Nominated         9m37s               karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-sksrm
  Normal   Nominated         6m27s               karpenter          Pod should schedule on: nodeclaim/g5-gpu-karpenter-lh626
  Warning  FailedScheduling  100s (x6 over 26m)  default-scheduler  0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  67s (x7 over 31m)   karpenter          Failed to schedule pod, incompatible with nodepool "x86-cpu-karpenter", daemonset overhead={"cpu":"210m","memory":"240Mi","pods":"6"}, incompatible requirements, key NodeGroupType, NodeGroupType In [g5-gpu-karpenter] not in NodeGroupType In [x86-cpu-karpenter]; incompatible with nodepool "g5-gpu-karpenter", daemonset overhead={"cpu":"210m","memory":"240Mi","pods":"6"}, no instance type satisfied resources {"cpu":"10210m","memory":"58839510Ki","nvidia.com/gpu":"1","pods":"7"} and requirements NodeGroupType In [g5-gpu-karpenter], NodePool In [g5-gpu-karpenter], karpenter.k8s.aws/instance-family In [g5], karpenter.k8s.aws/instance-size In [2xlarge 4xlarge 8xlarge], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [g5-gpu-karpenter], kubernetes.io/arch In [amd64], type In [karpenter] (no instance type which had enough resources and the required offering met the scheduling requirements)
  Normal   Nominated         7s (x2 over 3m17s)  karpenter          (combined from similar events): Pod should schedule on: nodeclaim/g5-gpu-karpenter-k86hc