karpenter-provider-aws Pods were CrashLoopBackOff in provisioned node by karpenter(rare occurrences)

Description

Observed Behavior:

Almost all Pods were CrashLoopBackOff in provisioned node by karpenter. This is basically no problems, rare occurrences.
- Most Pod errors are exec format error. The cause of that error is most likely the difference in architecture between Node and Container, but there was no difference in architecture between Node and Container. I confirmed that node and container architecture are same (amd).

container logs

amazon-cloudwatch-agent
  - exec /opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent: exec format error

fluentd-cloudwatch
  - fluentd-cloudwatch exec /usr/local/bin/tini: input/output error

ebs-csi-node
  - liveness-probe exec /livenessprobe: exec format error
  - node-driver-registrar exec /csi-node-driver-registrar: exec format error
  - ebs-plugin exec /bin/aws-ebs-csi-driver: exec format error

istio-init(istio-proxy)
  - exec /usr/local/bin/pilot-agent: exec format error

pod events

Warning  BackOff  2m38s (x5057 over 18h)  kubelet  Back-off restarting failed container cloudwatch-agent in pod cloudwatch-agent-5lvbg_amazon-cloudwatch(5e6dba79-5331-4582-b7d9-3a4d14890768)

Warning BackOff 1s (x4494 over 16h) kubelet Back-off restarting failed container istio-init in pod zozo-id-primary-7ff74ddcd6-85n72_zozo-id-dev14(173b74eb-ec42-43ae-8d66-da2f8b33a1b0)

Warning  BackOff  3m14s (x6551 over 23h)  kubelet  Back-off restarting failed container fluentd-cloudwatch in pod fluentd-cloudwatch-jhfqd_amazon-cloudwatch(eec85af4-f104-433d-a908-11c52ac6e72b)

Expected Behavior:

No errors and run pod

Reproduction Steps (Please include YAML):

Cannot reproduction

Versions:

Chart Version: 0.37.0
Kubernetes Version (kubectl version): Server Version: v1.29.4-eks-036c24b

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Jul 17 '24 07:07 EigoOda

Did you validate the AMI that you were using is the expected AMI? if the AMI is public, can you share the AMI ID?

Jul 23 '24 00:07 engedaam

@engedaam

Drift performs automatic Node upgrades, we have not specified almost anything about AMI. .spec.amiFamily in the NodeClass resource and .spec.template.spec.requirements in the NodePool resource like below. expected arch is amd64.

        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]

And we are using ami-028362abf82e2b470 now.

Jul 23 '24 02:07 EigoOda

My guess is that it appears to occur when multiple Pods (20+) are replacing Nodes at once.

Jul 25 '24 07:07 EigoOda

@EigoOda We encountered the same thing twice in one of our environment in the last 2-3 weeks. Do you have more info around that ami-028362abf82e2b470? I can't locate it on public, so most likely private to you. Is it a AL2023 base?

On our side we are on EKS 1.30 and AL2023 (custom built from the base EKS amazon-eks-node-al2023-arm64-standard-1.30-v20240703

Our guess is that it might be related to AL2023 somehow more than Karpenter but we weren't able to pull the EKS logs in time for investigating.

Aug 21 '24 13:08 jpbelangerupgrade

@jpbelangerupgrade This issue has been resolved by AWS support.

Caused by kernel module(iptable_nat, ip6table_nat) and containerd issues. This problem occurred in the following sequence:

Node(AL2023) starts then istio-init container (using istio as service mesh) starts.
Kernel module iptable_nat is loaded when executing a command in Istio container, but the command is executed while its initialization process is not fully completed.
Kernel Panic may occur when executing a command that uses the kernel module in question when the kernel module in question is not loaded due to a problem with iptable_nat or ip6table_nat. The OS will then be restarted.
The OS reboot may have caused a problem (https://github.com/containerd/containerd/pull/9401) with containerd, resulting in a data mismatch that may have caused the “exec format error” error.

Permanent response is set spec.userData of EC2NodeClass to pre-load kernel module like below.

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  userData: |
    #!/bin/bash
    modprobe iptable_nat
    modprobe ip6table_nat

This problem with the kernel module in question occurs in kernel version 5.15 or later.

Aug 22 '24 14:08 EigoOda

FTR, the upstream fixes are

iptable_nat: netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
ip6table_nat: netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().

and backported to 6.1.104 and 5.15.165, and we have released the following kernels with the fix.

kernel-5.15.165-110.161.amzn2
kernel-6.1.106-116.188.amzn2023

Thanks.

Sep 03 '24 23:09 q2ven