Pods were CrashLoopBackOff in provisioned node by karpenter(rare occurrences)
Description
Observed Behavior:
- Almost all Pods were CrashLoopBackOff in provisioned node by karpenter. This is basically no problems, rare occurrences.
- Most Pod errors are
exec format error. The cause of that error is most likely the difference in architecture between Node and Container, but there was no difference in architecture between Node and Container. I confirmed that node and container architecture are same (amd).
- Most Pod errors are
container logs
amazon-cloudwatch-agent
- exec /opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent: exec format error
fluentd-cloudwatch
- fluentd-cloudwatch exec /usr/local/bin/tini: input/output error
ebs-csi-node
- liveness-probe exec /livenessprobe: exec format error
- node-driver-registrar exec /csi-node-driver-registrar: exec format error
- ebs-plugin exec /bin/aws-ebs-csi-driver: exec format error
istio-init(istio-proxy)
- exec /usr/local/bin/pilot-agent: exec format error
pod events
Warning BackOff 2m38s (x5057 over 18h) kubelet Back-off restarting failed container cloudwatch-agent in pod cloudwatch-agent-5lvbg_amazon-cloudwatch(5e6dba79-5331-4582-b7d9-3a4d14890768)
Warning BackOff 1s (x4494 over 16h) kubelet Back-off restarting failed container istio-init in pod zozo-id-primary-7ff74ddcd6-85n72_zozo-id-dev14(173b74eb-ec42-43ae-8d66-da2f8b33a1b0)
Warning BackOff 3m14s (x6551 over 23h) kubelet Back-off restarting failed container fluentd-cloudwatch in pod fluentd-cloudwatch-jhfqd_amazon-cloudwatch(eec85af4-f104-433d-a908-11c52ac6e72b)
Expected Behavior:
- No errors and run pod
Reproduction Steps (Please include YAML):
- Cannot reproduction
Versions:
- Chart Version: 0.37.0
- Kubernetes Version (
kubectl version): Server Version: v1.29.4-eks-036c24b
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Did you validate the AMI that you were using is the expected AMI? if the AMI is public, can you share the AMI ID?
@engedaam
Drift performs automatic Node upgrades, we have not specified almost anything about AMI.
.spec.amiFamily in the NodeClass resource and .spec.template.spec.requirements in the NodePool resource like below. expected arch is amd64.
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
And we are using ami-028362abf82e2b470 now.
My guess is that it appears to occur when multiple Pods (20+) are replacing Nodes at once.
@EigoOda We encountered the same thing twice in one of our environment in the last 2-3 weeks. Do you have more info around that ami-028362abf82e2b470? I can't locate it on public, so most likely private to you. Is it a AL2023 base?
On our side we are on EKS 1.30 and AL2023 (custom built from the base EKS amazon-eks-node-al2023-arm64-standard-1.30-v20240703
Our guess is that it might be related to AL2023 somehow more than Karpenter but we weren't able to pull the EKS logs in time for investigating.
@jpbelangerupgrade This issue has been resolved by AWS support.
Caused by kernel module(iptable_nat, ip6table_nat) and containerd issues. This problem occurred in the following sequence:
- Node(AL2023) starts then istio-init container (using istio as service mesh) starts.
- Kernel module iptable_nat is loaded when executing a command in Istio container, but the command is executed while its initialization process is not fully completed.
- Kernel Panic may occur when executing a command that uses the kernel module in question when the kernel module in question is not loaded due to a problem with iptable_nat or ip6table_nat. The OS will then be restarted.
- The OS reboot may have caused a problem (https://github.com/containerd/containerd/pull/9401) with containerd, resulting in a data mismatch that may have caused the “exec format error” error.
Permanent response is set spec.userData of EC2NodeClass to pre-load kernel module like below.
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023
userData: |
#!/bin/bash
modprobe iptable_nat
modprobe ip6table_nat
This problem with the kernel module in question occurs in kernel version 5.15 or later.
FTR, the upstream fixes are
- iptable_nat: netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
- ip6table_nat: netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
and backported to 6.1.104 and 5.15.165, and we have released the following kernels with the fix.
- kernel-5.15.165-110.161.amzn2
- kernel-6.1.106-116.188.amzn2023
Thanks.