karpenter
karpenter copied to clipboard
karpenter fails to add a spot replacement to on-demand ones because of nodeclaim validation
Description
Observed Behavior: for a nodepool of mixed capacity types (on-demand, and spot) karpenter tries to decommission an on-demand instance to replace it with a spot instance. it then fails to do so because the generated nodeclaim contains a requirement label of the restricted domain "karpenter.sh"
check the controller logs
karpenter-779ff45f5c-nmn5w controller {"level":"INFO","time":"2024-04-25T10:22:27.988Z","logger":"controller.disruption","message":"disrupting via consolidation replace, terminating 1 nodes (25 pods) ip-10-149-88-228.eu-central-1.compute.internal/m5.xlarge/on-demand and replacing with spot node from types m5.xlarge","commit":"6b868db-dirty","command-id":"e31d43f1-3b17-4be9-acdb-658ba38f5b95"}
karpenter-779ff45f5c-nmn5w controller {"level":"ERROR","time":"2024-04-25T10:22:28.058Z","logger":"controller.disruption","message":"disrupting via \"consolidation\", disrupting candidates, launching replacement nodeclaim (command-id: e31d43f1-3b17-4be9-acdb-658ba38f5b95), creating node claim, NodeClaim.karpenter.sh \"karpenter-default-wx8rz\" is invalid: spec.requirements[9].key: Invalid value: \"string\": label domain \"karpenter.sh\" is restricted","commit":"6b868db-dirty"}
Expected Behavior: creating spot replacements for on-demand ones should not be blocked.
Reproduction Steps (Please include YAML): node pools
apiVersion: v1
items:
- apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
annotations:
karpenter.sh/nodepool-hash: "3243005398540344161"
karpenter.sh/nodepool-hash-version: v2
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"karpenter.sh/v1beta1","kind":"NodePool","metadata":{"annotations":{},"name":"karpenter-default"},"spec":{"disruption":{"consolidationPolicy":"WhenUnderutilized","expireAfter":"Never"},"template":{"metadata":{"labels":{"cluster-lifecycle-controller.zalan.do/replacement-strategy":"none","lifecycle-status":"ready","node.kubernetes.io/node-pool":"karpenter-default","node.kubernetes.io/profile":"worker-karpenter","node.kubernetes.io/role":"worker"}},"spec":{"kubelet":{"clusterDNS":["10.0.1.100"],"cpuCFSQuota":false,"kubeReserved":{"cpu":"100m","memory":"282Mi"},"maxPods":32,"systemReserved":{"cpu":"100m","memory":"164Mi"}},"nodeClassRef":{"name":"karpenter-default"},"requirements":[{"key":"node.kubernetes.io/instance-type","operator":"In","values":["m5.8xlarge","m5.xlarge"]},{"key":"karpenter.sh/capacity-type","operator":"In","values":["spot","on-demand"]},{"key":"kubernetes.io/arch","operator":"In","values":["arm64","amd64"]},{"key":"topology.kubernetes.io/zone","operator":"In","values":["eu-central-1a","eu-central-1b","eu-central-1c"]}],"startupTaints":[{"effect":"NoSchedule","key":"zalando.org/node-not-ready"}]}},"weight":1}}
creationTimestamp: "2024-04-25T09:09:18Z"
generation: 1
name: karpenter-default
resourceVersion: "1942211133"
uid: 0d6de200-cac7-4ea3-a12d-a254b60b29f9
spec:
disruption:
budgets:
- nodes: 10%
consolidationPolicy: WhenUnderutilized
expireAfter: Never
template:
metadata:
labels:
cluster-lifecycle-controller.zalan.do/replacement-strategy: none
lifecycle-status: ready
node.kubernetes.io/node-pool: karpenter-default
node.kubernetes.io/profile: worker-karpenter
node.kubernetes.io/role: worker
spec:
kubelet:
clusterDNS:
- 10.0.1.100
cpuCFSQuota: false
kubeReserved:
cpu: 100m
memory: 282Mi
maxPods: 32
systemReserved:
cpu: 100m
memory: 164Mi
nodeClassRef:
name: karpenter-default
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.8xlarge
- m5.xlarge
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- arm64
- amd64
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- eu-central-1b
- eu-central-1c
startupTaints:
- effect: NoSchedule
key: zalando.org/node-not-ready
weight: 1
status:
resources:
cpu: "8"
ephemeral-storage: 202861920Ki
memory: 32315584Ki
pods: "220"
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Versions:
- karpenter Version: v0.36.0
- Kubernetes Version (
kubectl version
): Server Version: v1.28.8
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
If the ip-10-149-88-228.eu-central-1.compute.internal
node is around, can you supply the node object and node claim YAML?
I found another instance (on a different cluster) where it failed to replace a spot node with another spot node. I captured the node object and nodeclaim YAMLs node object
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 172.31.5.136
csi.volume.kubernetes.io/nodeid: '{"ebs.csi.aws.com":"i-059805a98b7e75171"}'
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"d6:b1:3a:ae:4a:bd"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.5.136
karpenter.k8s.aws/ec2nodeclass-hash: "2026609550328776800"
karpenter.k8s.aws/ec2nodeclass-hash-version: v2
karpenter.sh/nodepool-hash: "4369624379001278596"
karpenter.sh/nodepool-hash-version: v2
kubectl.kubernetes.io/last-applied-configuration: {}
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2024-05-02T12:39:31Z"
finalizers:
- karpenter.sh/termination
labels:
aws.amazon.com/spot: "true"
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: m5.large
beta.kubernetes.io/os: linux
cluster-lifecycle-controller.zalan.do/replacement-strategy: none
failure-domain.beta.kubernetes.io/region: eu-central-1
failure-domain.beta.kubernetes.io/zone: eu-central-1a
karpenter.k8s.aws/instance-category: m
karpenter.k8s.aws/instance-cpu: "2"
karpenter.k8s.aws/instance-cpu-manufacturer: intel
karpenter.k8s.aws/instance-encryption-in-transit-supported: "false"
karpenter.k8s.aws/instance-family: m5
karpenter.k8s.aws/instance-generation: "5"
karpenter.k8s.aws/instance-hypervisor: nitro
karpenter.k8s.aws/instance-memory: "8192"
karpenter.k8s.aws/instance-network-bandwidth: "750"
karpenter.k8s.aws/instance-size: large
karpenter.sh/capacity-type: spot
karpenter.sh/initialized: "true"
karpenter.sh/nodepool: default-karpenter
karpenter.sh/registered: "true"
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-172-31-5-136.eu-central-1.compute.internal
kubernetes.io/os: linux
kubernetes.io/role: worker
lifecycle-status: ready
node.kubernetes.io/distro: ubuntu
node.kubernetes.io/instance-type: m5.large
node.kubernetes.io/node-pool: default-karpenter
node.kubernetes.io/profile: worker-karpenter
node.kubernetes.io/role: worker
topology.ebs.csi.aws.com/zone: eu-central-1a
topology.kubernetes.io/region: eu-central-1
topology.kubernetes.io/zone: eu-central-1a
name: ip-172-31-5-136.eu-central-1.compute.internal
ownerReferences:
- apiVersion: karpenter.sh/v1beta1
blockOwnerDeletion: true
kind: NodeClaim
name: default-karpenter-hcj5f
uid: 1c95cfac-270d-4bbf-b1c6-b8d1af38ef6f
resourceVersion: "2533516828"
uid: e7845ae8-042f-4e16-b31e-55ecd40ee6ac
spec:
podCIDR: 10.2.248.0/24
podCIDRs:
- 10.2.248.0/24
providerID: aws:///eu-central-1a/i-059805a98b7e75171
status: {}
nodeClaim
apiVersion: karpenter.sh/v1beta1
kind: NodeClaim
metadata:
annotations:
karpenter.k8s.aws/ec2nodeclass-hash: "2026609550328776800"
karpenter.k8s.aws/ec2nodeclass-hash-version: v2
karpenter.k8s.aws/tagged: "true"
karpenter.sh/nodepool-hash: "4369624379001278596"
karpenter.sh/nodepool-hash-version: v2
kubectl.kubernetes.io/last-applied-configuration: {}
creationTimestamp: "2024-05-02T12:38:47Z"
finalizers:
- karpenter.sh/termination
generateName: default-karpenter-
generation: 1
labels:
cluster-lifecycle-controller.zalan.do/replacement-strategy: none
karpenter.k8s.aws/instance-category: m
karpenter.k8s.aws/instance-cpu: "2"
karpenter.k8s.aws/instance-cpu-manufacturer: intel
karpenter.k8s.aws/instance-encryption-in-transit-supported: "false"
karpenter.k8s.aws/instance-family: m5
karpenter.k8s.aws/instance-generation: "5"
karpenter.k8s.aws/instance-hypervisor: nitro
karpenter.k8s.aws/instance-memory: "8192"
karpenter.k8s.aws/instance-network-bandwidth: "750"
karpenter.k8s.aws/instance-size: large
karpenter.sh/capacity-type: spot
karpenter.sh/nodepool: default-karpenter
kubernetes.io/arch: amd64
kubernetes.io/os: linux
lifecycle-status: ready
node.kubernetes.io/instance-type: m5.large
node.kubernetes.io/node-pool: default-karpenter
node.kubernetes.io/profile: worker-karpenter
node.kubernetes.io/role: worker
topology.kubernetes.io/region: eu-central-1
topology.kubernetes.io/zone: eu-central-1a
name: default-karpenter-hcj5f
ownerReferences:
- apiVersion: karpenter.sh/v1beta1
blockOwnerDeletion: true
kind: NodePool
name: default-karpenter
uid: 2536b136-fc71-40a9-a233-f51b81120e97
resourceVersion: "2533447771"
uid: 1c95cfac-270d-4bbf-b1c6-b8d1af38ef6f
spec:
kubelet:
clusterDNS:
- 10.0.1.100
cpuCFSQuota: false
kubeReserved:
cpu: 100m
memory: 282Mi
maxPods: 32
systemReserved:
cpu: 100m
memory: 164Mi
nodeClassRef:
name: default-karpenter
requirements:
- key: topology.kubernetes.io/region
operator: In
values:
- eu-central-1
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values:
- metal
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- key: node.kubernetes.io/node-pool
operator: In
values:
- default-karpenter
- key: node.kubernetes.io/profile
operator: In
values:
- worker-karpenter
- key: node.kubernetes.io/role
operator: In
values:
- worker
- key: karpenter.sh/nodepool
operator: In
values:
- default-karpenter
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- key: node.kubernetes.io/instance-type
operator: In
values:
- c5.xlarge
- c5d.xlarge
- c6i.xlarge
- c6id.xlarge
- c6in.xlarge
- m5.large
- m5.xlarge
- m5d.large
- m5d.xlarge
- m5n.large
- m5n.xlarge
- m6i.large
- m6i.xlarge
- m6id.large
- m6in.large
- r5.large
- r5d.large
- r5n.large
- r6i.large
- r6i.xlarge
- r6id.large
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- c5
- c5d
- c5n
- c6i
- c6id
- c6in
- m5
- m5d
- m5n
- m6i
- m6id
- m6in
- r5
- r5d
- r5n
- r6i
- r6id
- r6in
- key: cluster-lifecycle-controller.zalan.do/replacement-strategy
operator: In
values:
- none
- key: lifecycle-status
operator: In
values:
- ready
resources:
requests:
cpu: 1517m
ephemeral-storage: 2816Mi
memory: 5060Mi
pods: "14"
startupTaints:
- effect: NoSchedule
key: zalando.org/node-not-ready
status:
allocatable:
cpu: 1800m
ephemeral-storage: 89Gi
memory: 7031Mi
pods: "32"
vpc.amazonaws.com/pod-eni: "9"
capacity:
cpu: "2"
ephemeral-storage: 100Gi
memory: 7577Mi
pods: "32"
vpc.amazonaws.com/pod-eni: "9"
conditions:
- lastTransitionTime: "2024-05-02T12:40:21Z"
status: "True"
type: Initialized
- lastTransitionTime: "2024-05-02T12:38:49Z"
status: "True"
type: Launched
- lastTransitionTime: "2024-05-02T12:40:21Z"
status: "True"
type: Ready
- lastTransitionTime: "2024-05-02T12:39:31Z"
status: "True"
type: Registered
imageID: ******
nodeName: ip-172-31-5-136.eu-central-1.compute.internal
providerID: aws:///eu-central-1a/i-059805a98b7e75171
nodepool
piVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
annotations:
karpenter.sh/nodepool-hash: "4369624379001278596"
karpenter.sh/nodepool-hash-version: v2
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"karpenter.sh/v1beta1","kind":"NodePool","metadata":{"annotations":{},"name":"default-karpenter"},"spec":{"disruption":{"consolidationPolicy":"WhenUnderutilized","expireAfter":"Never"},"template":{"metadata":{"labels":{"cluster-lifecycle-controller.zalan.do/replacement-strategy":"none","lifecycle-status":"ready","node.kubernetes.io/node-pool":"default-karpenter","node.kubernetes.io/profile":"worker-karpenter","node.kubernetes.io/role":"worker"}},"spec":{"kubelet":{"clusterDNS":["10.0.1.100"],"cpuCFSQuota":false,"kubeReserved":{"cpu":"100m","memory":"282Mi"},"maxPods":32,"systemReserved":{"cpu":"100m","memory":"164Mi"}},"nodeClassRef":{"name":"default-karpenter"},"requirements":[{"key":"karpenter.k8s.aws/instance-family","operator":"In","values":["c5","m5","r5","c5d","m5d","r5d","c5n","m5n","r5n","c6i","m6i","r6i","c6id","m6id","r6id","c6in","m6in","r6in"]},{"key":"karpenter.k8s.aws/instance-size","operator":"NotIn","values":["metal"]},{"key":"node.kubernetes.io/instance-type","operator":"NotIn","values":["c5d.large"]},{"key":"karpenter.sh/capacity-type","operator":"In","values":["spot","on-demand"]},{"key":"kubernetes.io/arch","operator":"In","values":["arm64","amd64"]},{"key":"topology.kubernetes.io/zone","operator":"In","values":["eu-central-1a","eu-central-1b","eu-central-1c"]}],"startupTaints":[{"effect":"NoSchedule","key":"zalando.org/node-not-ready"}]}}}}
creationTimestamp: "2024-02-08T15:16:14Z"
generation: 2
name: default-karpenter
resourceVersion: "2534926162"
uid: 2536b136-fc71-40a9-a233-f51b81120e97
spec:
disruption:
budgets:
- nodes: 10%
consolidationPolicy: WhenUnderutilized
expireAfter: Never
template:
metadata:
labels:
cluster-lifecycle-controller.zalan.do/replacement-strategy: none
lifecycle-status: ready
node.kubernetes.io/node-pool: default-karpenter
node.kubernetes.io/profile: worker-karpenter
node.kubernetes.io/role: worker
spec:
kubelet:
clusterDNS:
- 10.0.1.100
cpuCFSQuota: false
kubeReserved:
cpu: 100m
memory: 282Mi
maxPods: 32
systemReserved:
cpu: 100m
memory: 164Mi
nodeClassRef:
name: default-karpenter
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- c5
- m5
- r5
- c5d
- m5d
- r5d
- c5n
- m5n
- r5n
- c6i
- m6i
- r6i
- c6id
- m6id
- r6id
- c6in
- m6in
- r6in
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values:
- metal
- key: node.kubernetes.io/instance-type
operator: NotIn
values:
- c5d.large
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- arm64
- amd64
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- eu-central-1b
- eu-central-1c
startupTaints:
- effect: NoSchedule
key: zalando.org/node-not-ready
status:
resources:
cpu: "102"
ephemeral-storage: 2713582276Ki
memory: 482793344Ki
pods: "1980"
ec2NodeClass
apiVersion: v1
items:
- apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
annotations:
karpenter.k8s.aws/ec2nodeclass-hash: "2026609550328776800"
karpenter.k8s.aws/ec2nodeclass-hash-version: v2
kubectl.kubernetes.io/last-applied-configuration: {}
creationTimestamp: "2024-02-08T15:16:14Z"
finalizers:
- karpenter.k8s.aws/termination
generation: 7
name: default-karpenter
resourceVersion: "2516183961"
uid: 2d7763a9-397e-4ffb-865f-92dfdaa1179e
spec:
amiFamily: Custom
amiSelectorTerms:
- id: ami-*****
- id: ami-*****
associatePublicIPAddress: true
blockDeviceMappings:
- deviceName: /dev/sda1
ebs:
deleteOnTermination: true
volumeSize: 100Gi
volumeType: gp3
detailedMonitoring: false
instanceProfile: .******
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: optional
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: WorkerNodeSecurityGroup
subnetSelectorTerms:
- tags:
kubernetes.io/role/karpenter: enabled
tags:
InfrastructureComponent: "true"
Name: default-karpenter
application: kubernetes
component: shared-resource
environment: test
node.kubernetes.io/node-pool: default-karpenter
node.kubernetes.io/role: worker
zalando.de/cluster-local-id/kube-1: owned
zalando.org/pod-max-pids: "4096"
userData: {.....}
status: {}
kind: List
metadata:
resourceVersion: ""
/assign @engedaam
@myaser Apologize for the late response on this one. Are you still seeing this issue?
@myaser Apologize for the late response on this one. Are you still seeing this issue?
yes, It is still happening on some of our clusters
@myaser In the process of attempting to reproduce this issue, will update once we have more to share
I have a better understanding now for this issue, and here is how to reproduce it
we found a pod that uses invalid node affinity, the affinity was preferredDuringSchedulingIgnoredDuringExecution
so it was ignored/relaxed by karpenter during initial scheduling. later on when the node nominated for the pod got consolidated, karpenter logged this error message. it eventually managed to replace the node, but it took much longer
it seems like it did not relax/ignore the preferred affinity, also the error message was strange/misleading
example pod:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: testing-nginx
owner: mgaballah
name: testing-nginx
spec:
replicas: 1
selector:
matchLabels:
app: testing-nginx
template:
metadata:
labels:
app: testing-nginx
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
preference:
matchExpressions:
- key: karpenter.sh/provisioner-name
operator: DoesNotExist
containers:
- image: nginx
name: nginx
resources:
limits:
cpu: 200m
memory: 50Mi
requests:
cpu: 200m
memory: 50Mi
after it gets scheduled, try to consolidate the node by (for example) deleting the node object we fixed the pod, and the issue disappeared for us. with this understanding, I think this issue is lesser than a bug, but still I would be interested to understand few things
- why karpenter did not relax the nodeAffinity constraints
- the error message was misleading
/triage accepted
Just encountered a similar problem. We have an EKS cluster deployed by Terraform with a NodeGroup of 1 node in which Karpenter v0.36 is installed and worked properly.
We recently added a soft nodeAffinity on a few pods to create a preference for the node managed by TF. As karpenter nodes already contains a few labels, we used a DoesNotExist
operator on the karpenter.sh/nodepool-hash
key and got errors similar to what OP had.
Initial Affinity we used
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: karpenter.sh/nodepool-hash
operator: DoesNotExist
weight: 50
associated Karpenter's controler logs
{"level":"INFO","time":"2024-06-03T13:20:40.941Z","logger":"controller.disruption","message":"triggering termination for expired node after TTL","commit":"6b868db","ttl":"1h0m0s"}
{"level":"INFO","time":"2024-06-03T13:20:40.941Z","logger":"controller.disruption","message":"disrupting via expiration replace, terminating 1 nodes (2 pods) xxxxxxxxxxxxxxxxxxxx.compute.internal/t4g.small/on-demand and replacing with on-demand node from types t4g.small, t3a.small, t3.small, t4g.medium, t3a.medium and 34 other(s)","commit":"6b868db","command-id":"58d486d2-012b-4f3e-ad32-f46f6e82d449"}
{"level":"ERROR","time":"2024-06-03T13:20:41.170Z","logger":"controller.disruption","message":"disrupting via \"expiration\", disrupting candidates, launching replacement nodeclaim (command-id: 58d486d2-012b-4f3e-ad32-f46f6e82d449), creating node claim, NodeClaim.karpenter.sh \"default-5g9xb\" is invalid: spec.requirements[4].key: Invalid value: \"string\": label domain \"karpenter.sh\" is restricted","commit":"6b868db"}
{"level":"INFO","time":"2024-06-03T13:21:14.244Z","logger":"controller.disruption","message":"triggering termination for expired node after TTL","commit":"6b868db","ttl":"1h0m0s"}
{"level":"INFO","time":"2024-06-03T13:21:14.244Z","logger":"controller.disruption","message":"disrupting via expiration replace, terminating 1 nodes (2 pods) xxxxxxxxxxxxxxxx.compute.internal/t4g.small/on-demand and replacing with on-demand node from types t4g.small, t3a.small, t3.small, t4g.medium, t3a.medium and 34 other(s)","commit":"6b868db","command-id":"7827779a-3b0c-4c65-83b0-8d427de328be"}
{"level":"ERROR","time":"2024-06-03T13:21:14.462Z","logger":"controller.disruption","message":"disrupting via \"expiration\", disrupting candidates, launching replacement nodeclaim (command-id: 7827779a-3b0c-4c65-83b0-8d427de328be), creating node claim, NodeClaim.karpenter.sh \"default-785zb\" is invalid: spec.requirements[1].key: Invalid value: \"string\": label domain \"karpenter.sh\" is restricted","commit":"6b868db"}
N.B. : sensitive information removed
Later on I questionned the key we used realising that karpenter.sh/nodepool-hash
is an annotation key and not a label key. So I switched to karpenter.sh/nodepool
and it seemed to have solved the problem.
Karpenter's controller last log before applying the patched nodeAffinity
{"level":"ERROR","time":"2024-06-04T10:29:23.741Z","logger":"controller.disruption","message":"disrupting via \"expiration\", disrupting candidates, launching replacement nodeclaim (command-id: cf21d12e-5e6c-417f-92d2-482bf9c78042), creating node claim, NodeClaim.karpenter.sh \"default-xznbg\" is invalid: spec.requirements[2].key: Invalid value: \"string\": label domain \"karpenter.k8s.aws\" is restricted","commit":"6b868db"}
Post-apply logs
{"level":"INFO","time":"2024-06-04T10:46:04.246Z","logger":"controller.disruption","message":"triggering termination for expired node after TTL","commit":"6b868db","ttl":"1h0m0s"}
{"level":"INFO","time":"2024-06-04T10:46:04.248Z","logger":"controller.disruption","message":"disrupting via expiration replace, terminating 1 nodes (2 pods) xxxxxxxxxxxxxxxxxxxxxxxx.compute.internal/t4g.small/on-demand and replacing with on-demand node from types t4g.small, t3a.small, t3.small, t4g.medium, t3a.medium and 34 other(s)","commit":"6b868db","command-id":"7c2ae915-8210-4df1-80a6-3462a95c16c8"}
{"level":"INFO","time":"2024-06-04T10:46:04.482Z","logger":"controller.disruption","message":"created nodeclaim","commit":"6b868db","nodepool":"default","nodeclaim":"default-pc2xq","requests":{"cpu":"1220m","memory":"690Mi","pods":"6"},"instance-types":"c5.large, c5.xlarge, c5a.large, c5a.xlarge, c5d.large and 34 other(s)"}
{"level":"INFO","time":"2024-06-04T10:46:07.114Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"6b868db","nodeclaim":"default-pc2xq","provider-id":"aws:///xxxxxxxxxx/i-0d9453af78ad7983e","instance-type":"t4g.small","zone":"xxxxxxxxxx","capacity-type":"on-demand","allocatable":{"cpu":"1930m","ephemeral-storage":"17Gi","memory":"1359Mi","pods":"32"}}
{"level":"INFO","time":"2024-06-04T10:46:15.644Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"6b868db","pods":"monitoring/prometheus-prometheus-kube-prometheus-prometheus-0, kube-system/coredns-dfd64456d-756fw","duration":"196.715971ms"}
{"level":"INFO","time":"2024-06-04T10:46:25.643Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"6b868db","pods":"monitoring/prometheus-prometheus-kube-prometheus-prometheus-0, kube-system/coredns-dfd64456d-756fw","duration":"194.918258ms"}
{"level":"INFO","time":"2024-06-04T10:46:29.844Z","logger":"controller.nodeclaim.lifecycle","message":"registered nodeclaim","commit":"6b868db","nodeclaim":"default-pc2xq","provider-id":"aws:///xxxxxxxxxx/i-0d9453af78ad7983e","node":"xxxxxxxxxxxxxxxxxxxxxxxx.compute.internal"}
{"level":"INFO","time":"2024-06-04T10:46:35.742Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"6b868db","pods":"monitoring/prometheus-prometheus-kube-prometheus-prometheus-0, kube-system/coredns-dfd64456d-756fw","duration":"293.341699ms"}
{"level":"INFO","time":"2024-06-04T10:46:39.474Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"6b868db","nodeclaim":"default-pc2xq","provider-id":"aws:///xxxxxxxxxx/i-0d9453af78ad7983e","node":"xxxxxxxxxxxxxxxxxxxxxxxx.compute.internal","allocatable":{"cpu":"1930m","ephemeral-storage":"18233774458","hugepages-1Gi":"0","hugepages-2Mi":"0","hugepages-32Mi":"0","hugepages-64Ki":"0","memory":"1408504Ki","pods":"32"}}
{"level":"INFO","time":"2024-06-04T10:46:41.472Z","logger":"controller.disruption.queue","message":"command succeeded","commit":"6b868db","command-id":"7c2ae915-8210-4df1-80a6-3462a95c16c8"}
{"level":"INFO","time":"2024-06-04T10:46:41.567Z","logger":"controller.node.termination","message":"tainted node","commit":"6b868db","node":"xxxxxxxxxxxxxxxxxxxxxxxx.compute.internal"}
{"level":"INFO","time":"2024-06-04T10:46:43.242Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"6b868db","pods":"monitoring/prometheus-prometheus-kube-prometheus-prometheus-0, kube-system/coredns-dfd64456d-756fw","duration":"590.049476ms"}
{"level":"INFO","time":"2024-06-04T10:46:49.190Z","logger":"controller.node.termination","message":"deleted node","commit":"6b868db","node":"xxxxxxxxxxxxxxxxxxxxxxxx.compute.internal"}
{"level":"INFO","time":"2024-06-04T10:46:49.685Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"6b868db","nodeclaim":"default-7f7lf","node":"xxxxxxxxxxxxxxxxxxxxxxxx.compute.internal","provider-id":"aws:///xxxxxxxxxx/i-0a09fedd4e0233a92"}
N.B. : sensitive information removed
I would agree too that the error log can be missleading Hope it could help someone in need for help :)