autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Karpenter going into crashloopback after eks upgraded from 1.32 to 1.33

Open sivamalla42 opened this issue 3 weeks ago • 3 comments

hi Team,

We observed an issue where we started observing karpenter 1.8.1 pods going to crashloopback. The pods describe events show only the liveness and readiness probe failing.

Changes EKS upgraded 1.32 to 1.33 karpenter pods went to crashloop on 1.8.1

No logs were showing any information

sivamalla42 avatar Dec 15 '25 13:12 sivamalla42

Hi @sivamalla42, I can help dig into this.

Could you please provide a few more details to help diagnose the crash?

  1. Exact Versions

  2. Pod Details & Events Looking for probe failures or volume mount errors in the events.

  3. Previous Logs Since the current logs might be empty if it crashes immediately, checking the previous instance is critical:

AbhinavPInamdar avatar Dec 15 '25 14:12 AbhinavPInamdar

Hi @AbhinavPInamdar ,

Thanks for your response, please find the below details

Exact Versions:

helm list -n karpenter NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION karpenter karpenter 4 2025-12-15 17:09:55.809541 +0530 IST failed karpenter-1.8.3 1.8.3
karpenter-crd karpenter 3 2025-12-15 16:29:24.42723 +0530 IST deployed karpenter-crd-1.8.3 1.8.3

eks version: v1.33.5-eks-ecaa3a6

Pod Details & Events Looking for probe failures or volume mount errors in the events.

Events: Type Reason Age From Message

Warning Unhealthy 29m (x66 over 3h16m) kubelet (combined from similar events): Readiness probe failed: Get "http://XXXXX:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 19m (x129 over 3h29m) kubelet Liveness probe failed: Get "http://XXXX:8081/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Normal Killing 9m27s (x47 over 3h28m) kubelet Container controller failed liveness probe, will be restarted Warning Unhealthy 8m46s (x115 over 3h29m) kubelet Readiness probe failed: Get "http://XXXXXX:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning BackOff 4m25s (x518 over 3h16m) kubelet Back-off restarting failed container controller in pod karpenter-f874cbf47-kgwzx_karpenter(fc8121e5-6b94-4270-b756-e8fc02716d85) Normal Pulled 2m15s (x49 over 3h30m) kubelet Container image "public.ecr.aws/karpenter/controller:1.8.3@sha256:2814fc3dfd440118fae8c75426d469f7015d77353c8b4b1c2fe6398838fd6627" already present on machine Normal Started 7s (x50 over 3h30m) kubelet Started container controller

Previous Logs Since the current logs might be empty if it crashes immediately, checking the previous instance is critical: unfortunately no logs available yet

sivamalla42 avatar Dec 15 '25 15:12 sivamalla42

To add to this background. karpenter was installed successfully with 1.8.1 on eks 1.32 and everything was working as expected. but once the eks 1.32 is upgraded to 1.33, we observe the pods going to crashloopbackoff state

sivamalla42 avatar Dec 15 '25 15:12 sivamalla42