data-on-eks
data-on-eks copied to clipboard
Error while running RayServe with vLLM blueprint template
Description
Generally speaking, got an error while trying to run any GenAI blueprint template. For this particular use case, the RayServe GPU node group is not been deployed as per example shows. The following error depicts after debugging Node group:
Failed to schedule pod, incompatible with nodepool "x86-cpu-karpenter", daemonset overhead={"cpu":"210m","memory":"240Mi","pods":"6"}
Versions
-
Module version [Required]: AIML/Jark-Stack
-
Terraform version: Terraform v1.8.4 on linux_amd64
- Provider version(s): Terraform v1.8.4 on linux_amd64
Reproduction Code [Required]
Steps to reproduce the behavior:
Yes Yes- git clone https://github.com/awslabs/data-on-eks.git
- cd data-on-eks/ai-ml/jark-stack/terraform && chmod +x install.sh
- ./install.sh
- aws eks --region us-west-2 update-kubeconfig --name jark-stack
- export HUGGING_FACE_HUB_TOKEN=$(echo -n "Your-Hugging-Face-Hub-Token-Value" | base64)
- cd data-on-eks/gen-ai/inference/vllm-rayserve-gpu
- envsubst < ray-service-vllm.yaml| kubectl apply -f -
- kubectl get pod -n rayserve-vllm
Expected behavior
- kubectl get pod -n rayserve-vllm should show the following:
NAME READY STATUS RESTARTS AGE
vllm-raycluster-nvtxg-head-g2cg8 1/1 Running 0 47m
vllm-raycluster-nvtxg-worker-gpu-group-msl5p 1/1 Running 0 47m
Actual behavior
- kubectl get pod -n rayserve-vllm shows the following:
NAME READY STATUS RESTARTS AGE
vllm-raycluster-r66w9-head-zkt8k 2/2 Running 0 30m
vllm-raycluster-r66w9-worker-gpu-group-v72wh 0/1 Pending 0 30m
Terminal Output Screenshot(s)
kubectl describe pod vllm-raycluster-r66w9-worker-gpu-group-v72wh -n rayserve-vllm
Name: vllm-raycluster-r66w9-worker-gpu-group-v72wh
Namespace: rayserve-vllm
Priority: 0
Service Account: default
Node: <none>
Labels: app.kubernetes.io/created-by=kuberay-operator
app.kubernetes.io/name=kuberay
ray.io/cluster=vllm-raycluster-r66w9
ray.io/group=gpu-group
ray.io/identifier=vllm-raycluster-r66w9-worker
ray.io/is-ray-node=yes
ray.io/node-type=worker
ray.io/serve=true
Annotations: ray.io/ft-enabled: false
Status: Pending
IP:
IPs: <none>
Controlled By: RayCluster/vllm-raycluster-r66w9
Init Containers:
wait-gcs-ready:
Image: public.ecr.aws/data-on-eks/ray2.24.0-py310-vllm-gpu:v1
Port: <none>
Host Port: <none>
Command:
/bin/bash
-lc
--
Args:
SECONDS=0
while true; do
if (( SECONDS <= 120 )); then
if ray health-check --address vllm.rayserve-vllm.svc.cluster.local:6379 > /dev/null 2>&1; then
echo "GCS is ready."
break
fi
echo "$SECONDS seconds elapsed: Waiting for GCS to be ready."
else
if ray health-check --address vllm.rayserve-vllm.svc.cluster.local:6379; then
echo "GCS is ready. Any error messages above can be safely ignored."
break
fi
echo "$SECONDS seconds elapsed: Still waiting for GCS to be ready. For troubleshooting, refer to the FAQ at https://github.com/ray-project/kuberay/blob/master/docs/guidance/FAQ.md."
fi
sleep 5
done
Limits:
cpu: 200m
memory: 256Mi
Requests:
cpu: 200m
memory: 256Mi
Environment:
VLLM_PORT: 8004
LD_LIBRARY_PATH: /home/ray/anaconda3/lib:
HUGGING_FACE_HUB_TOKEN: <set to the key 'hf-token' in secret 'hf-token'> Optional: false
FQ_RAY_IP: vllm.rayserve-vllm.svc.cluster.local
RAY_IP: vllm
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhjsr (ro)
Containers:
ray-worker:
Image: public.ecr.aws/data-on-eks/ray2.24.0-py310-vllm-gpu:v1
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/bash
-lc
--
Args:
ulimit -n 65536; ray start --address=vllm.rayserve-vllm.svc.cluster.local:6379 --metrics-export-port=8080 --block --dashboard-agent-listen-port=52365 --num-cpus=10 --memory=60000000000 --num-gpus=1
Limits:
cpu: 10
memory: 60G
nvidia.com/gpu: 1
Requests:
cpu: 10
memory: 60G
nvidia.com/gpu: 1
Liveness: exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success] delay=30s timeout=1s period=5s #success=1 #failure=120
Readiness: exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success && wget -T 2 -q -O- http://localhost:8000/-/healthz | grep success] delay=10s timeout=1s period=5s #success=1 #failure=1
Environment:
VLLM_PORT: 8004
LD_LIBRARY_PATH: /home/ray/anaconda3/lib:
HUGGING_FACE_HUB_TOKEN: <set to the key 'hf-token' in secret 'hf-token'> Optional: false
FQ_RAY_IP: vllm.rayserve-vllm.svc.cluster.local
RAY_IP: vllm
RAY_CLUSTER_NAME: (v1:metadata.labels['ray.io/cluster'])
RAY_CLOUD_INSTANCE_ID: vllm-raycluster-r66w9-worker-gpu-group-v72wh (v1:metadata.name)
RAY_NODE_TYPE_NAME: (v1:metadata.labels['ray.io/group'])
KUBERAY_GEN_RAY_START_CMD: ray start --address=vllm.rayserve-vllm.svc.cluster.local:6379 --metrics-export-port=8080 --block --dashboard-agent-listen-port=52365 --num-cpus=10 --memory=60000000000 --num-gpus=1
RAY_PORT: 6379
RAY_timeout_ms_task_wait_for_death_info: 0
RAY_gcs_server_request_timeout_seconds: 5
RAY_SERVE_KV_TIMEOUT_S: 5
RAY_ADDRESS: vllm.rayserve-vllm.svc.cluster.local:6379
RAY_USAGE_STATS_KUBERAY_IN_USE: 1
REDIS_PASSWORD:
RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE: 1
Mounts:
/dev/shm from shared-mem (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhjsr (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
shared-mem:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 60G
kube-api-access-jhjsr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: NodeGroupType=g5-gpu-karpenter
type=karpenter
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 31m default-scheduler 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Normal Nominated 31m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-fc4pv
Normal Nominated 28m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-jnzd8
Normal Nominated 25m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-w9g6f
Normal Nominated 22m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-ldz6d
Normal Nominated 19m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-f8ssz
Normal Nominated 15m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-8n9vv
Normal Nominated 12m karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-h2wrf
Normal Nominated 9m37s karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-sksrm
Normal Nominated 6m27s karpenter Pod should schedule on: nodeclaim/g5-gpu-karpenter-lh626
Warning FailedScheduling 100s (x6 over 26m) default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 67s (x7 over 31m) karpenter Failed to schedule pod, incompatible with nodepool "x86-cpu-karpenter", daemonset overhead={"cpu":"210m","memory":"240Mi","pods":"6"}, incompatible requirements, key NodeGroupType, NodeGroupType In [g5-gpu-karpenter] not in NodeGroupType In [x86-cpu-karpenter]; incompatible with nodepool "g5-gpu-karpenter", daemonset overhead={"cpu":"210m","memory":"240Mi","pods":"6"}, no instance type satisfied resources {"cpu":"10210m","memory":"58839510Ki","nvidia.com/gpu":"1","pods":"7"} and requirements NodeGroupType In [g5-gpu-karpenter], NodePool In [g5-gpu-karpenter], karpenter.k8s.aws/instance-family In [g5], karpenter.k8s.aws/instance-size In [2xlarge 4xlarge 8xlarge], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/nodepool In [g5-gpu-karpenter], kubernetes.io/arch In [amd64], type In [karpenter] (no instance type which had enough resources and the required offering met the scheduling requirements)
Normal Nominated 7s (x2 over 3m17s) karpenter (combined from similar events): Pod should schedule on: nodeclaim/g5-gpu-karpenter-k86hc