amazon-vpc-cni-k8s CNI not removing network built on a node after IP is lost externally and IPAMD reconciles this state

IPAM reconciliation: Scenario;

Pod is created and assigned an IP, 10.0.2.99
the IP after complete sandbox initialization is reclaimed by an automation in the network external to the cluster
the IPAMD logs show an IP pool reconcile that catches this lost IP and reconciles its cache calling EC2 endpoint
the network route for this pod with IP 10.0.2.99 remains unchanged on the local node however, other node peers are no longer able to reach this pod on 10.0.2.99 of its host nodes, it is reachable from this local host and kubernetes liveness probes are succeeding - keeping an unhealthy pod in the cluster


{"level":"debug","ts":"2024-03-08T18:10:50.378Z","caller":"rpc/rpc.pb.go:713","msg":"AddNetworkRequest: K8S_POD_NAME:\"liveness-http\" K8S_POD_NAMESPACE:\"gateway-ns\" K8S_POD_INFRA_CONTAINER_ID:\"7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380\" ContainerID:\"7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-d4e752dc-bdf7-f594-2a1a-38dfa2445dfb\""}

{"level":"info","ts":"2024-03-08T18:10:50.378Z","caller":"datastore/data_store.go:750","msg":"AssignPodIPv4Address: Assign IP 10.0.2.99 to sandbox aws-cni/7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380/eth0"}

Externl automation event Event time
March 08, 2024, 18:11:25 (UTC+00:00) UnassignPrivateIpAddresses  "privateIpAddress": "10.0.2.99"

{"level":"warn","ts":"2024-03-08T18:12:00.256Z","caller":"ipamd/ipamd.go:1404","msg":"Instance metadata does not match data store! ipPool: [10.0.2.99 10.0.2.27 10.0.2.158], metadata: [{\n  Primary: true,\n  PrivateIpAddress: \"10.0.2.149\"\n} {\n  Primary: false,\n  PrivateIpAddress: \"10.0.2.27\"\n} {\n  Primary: false,\n  PrivateIpAddress: \"10.0.2.158\"\n}]"}

{"level":"info","ts":"2024-03-08T18:12:00.334Z","caller":"datastore/data_store.go:578","msg":"UnAssignPodIPAddress: Unassign IP 10.0.2.99 from sandbox aws-cni/7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380/eth0"}

What you expected to happen:

After event "UnAssignPodIPAddress: Unassign IP 10.0.2.99 from sandbox aws-cni/7f9240... the CNI is triggered to tear down the network route with this IP, and liveness probe may eventually fail and attempt to heal this pod.

How to reproduce it (as minimally and precisely as possible):

create pod with liveness and readiness probe, like;

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness3
  name: liveness-http3
spec:
  containers:
  - name: ngo-proxy
    image: gcr.io/google_containers/echoserver:1.4
    # args:
    # - /server
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8080
        # httpHeaders:
        # - name: Custom-Header
        #   value: Awesome
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8080
      # initialDelaySeconds: 50
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 2
  restartPolicy: Always

remove the IP from the node this pod is scheduled at any time

Anything else we need to know?:

during the sweep phase of the nodeIPPoolReconcile process, should the CNI be invoked to updateHostNetwork for the removed IPs?
see issue

Environment:

Kubernetes version (use kubectl version):
CNI Version: image: 602401143452.dkr.ecr.us-west-1.amazonaws.com/amazon-k8s-cni-init:v1.15.3-eksbuild.1
OS (e.g: cat /etc/os-release):

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

Kernel (e.g. uname -a):

Linux ....compute.internal 5.10.198-187.748.amzn2.x86_64 #1 SMP Tue Oct 24 19:49:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Mar 08 '24 22:03 abeowlu

@AbeOwlu what is this "external event" that reclaims an IP on an ENI? Only the IPAM daemon should be assigning and unassigning IPs to an ENI. Before calling the EC2 API to unassign IPs, it removes those IPs from the datastore. That precondition is required to avoid this exact scenario

Mar 08 '24 23:03 jdn5126

There's an automation pipeline that's incorrectly, (I might add) seeing a drift in the VPC network and unassigns an IP from an EC2 instance at the moment.

looking into this further, it actually appears to show the CRI attempting to recreate container sandbox, but the CNI was not not responsive.. connection refused on the 3 attempts so the orchestrator may may be handling this case.

Will update with more details and logs...

Mar 13 '24 00:03 abeowlu

I think I hit this issue too. Let me circle back with some more info

Apr 19 '24 03:04 GnatorX