amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

ENI network broken

Open BruceLuX opened this issue 1 month ago • 5 comments

What happened: The Pod which use the secondary ENI ip, cannot access internal SVC (e.g. kubernetes).

What you expected to happen: Pod can connect to Kubernetes internal SVC.

How to reproduce it (as minimally and precisely as possible):

  1. Use th lanuch template to run the node, please check my userdata as below :
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash

set -ex

cat > /etc/kubernetes/nodeadm-bootstrap.yaml <<EOF
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: LAB-1-30
    apiServerEndpoint: https://1234567890.yl4.cn-north-1.eks.amazonaws.com.cn
    certificateAuthority: xxxx
    cidr: 172.20.0.0/16
EOF

nodeadm init --config-source file:///etc/kubernetes/nodeadm-bootstrap.yaml
sleep 300   # I will install some software in this step in my prod environment, replace it by sleep command could also reproduce this issue also. 
nodeadm init --config-source file:///etc/kubernetes/nodeadm-bootstrap.yaml
--//--
  1. lanuch the instance

  2. deploy the netshoot pod into the node :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: netshoot-test
spec:
  replicas: 10
  selector:
    matchLabels:
      app: netshoot-test
  template:
    metadata:
      labels:
        app: netshoot-test
    spec:
      nodeName: ip-x-x-x-x.cn-north-1.compute.internal
      containers:
      - name: netshoot
        image: nicolaka/netshoot
        imagePullPolicy: Always
        command: ["/bin/bash", "-ce", "tail -f /dev/null"]
  1. Check the pod which run on the secondary ENI , and telnet the kubernetes SVC : k exec -it netshoot-test-7bb8ff8fb-h5f4b -- /bin/bash netshoot-test-7bb8ff8fb-h5f4b:~# telnet kubernetes 443 telnet: bad address 'kubernetes'

  2. But the Pod which run on the Primary ENI is working fine.

Environment:

  • AWS Region: cn-north-1
  • Instance Type(s): arm / t4g.medium
  • Cluster Kubernetes version: 1.33
  • Node Kubernetes version: 1.33
  • AMI Version: standard-1.33-v20251023

BruceLuX avatar Nov 14 '25 02:11 BruceLuX

VPC CNI : v1.20.4-eksbuild.1 Kube-Proxy : v1.33.3-eksbuild.10

BruceLuX avatar Nov 14 '25 02:11 BruceLuX

I didnt use the custom network configuration or network policy, and just use the VPC-CNI default mode.

BruceLuX avatar Nov 14 '25 02:11 BruceLuX

hey @BruceLuX , trying to understand the drive behind including this in your user data:

nodeadm init --config-source file:///etc/kubernetes/nodeadm-bootstrap.yaml
sleep 300   # I will install some software in this step in my prod environment, replace it by sleep command could also reproduce this issue also. 
nodeadm init --config-source file:///etc/kubernetes/nodeadm-bootstrap.yaml

is there something in particular you're aiming to accomplish with the manual nodeadm execution? for clarity, we typical split nodeadm into two phases, once that goes before user data script execution and one that goes after - these are called config and run respectively. from the snippet you've shared, nothing seems dynamic or would require writing the config to disk, just leaving the config part like in this example would do the trick, you can also use our playground to validate beforehand.

I suspect removing the manual nodeadm execution will resolve this (or at the very least adding --skip run to the executions), but we can look into making this situation a bit more stable.

mselim00 avatar Nov 14 '25 07:11 mselim00

Hi @mselim00 Many thanks for your reply, sleep 300 is used to simulate downloading and installing monitoring software before node join the cluster. Actually we have already mitigated this issue by adjust the userdata, but what confuse me is why run the same nodeadm init command, these two commands were only 300 seconds apart, will causes the ENI networking broken.

BruceLuX avatar Nov 14 '25 07:11 BruceLuX

glad you've found a mitigation! I believe what's happening here is that the nodeadm invocation is causing the instance to be joined to a cluster and a CNI to start on it that attaches an ENI before the process that's taking up ~5 minutes completes. this ENI is treated as system-managed (not CNI-managed) because it was attached prior to cloud-init completion (where user data scripts are executed). ordinarily, kubelet would not be started until after user data scripts all execute, so any CNI-attached interface would be done after cloud-init completes, and would be left to be managed by the CNI. we can certainly look into ways to improve this experience, but for now (and in general), I'd recommend to avoid executing nodeadm directly in user data

mselim00 avatar Nov 14 '25 07:11 mselim00