amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

iptables contention between vpc-cni and kube-proxy

Open duxing opened this issue 8 months ago • 7 comments

What happened:

on an EKS cluster with many Services (1000 in my case) and many pods (300 pods), big iptable lead to long execution time for some iptable rules (5s+ seconds)

this leads to xtable contention between kube-proxy and vpc-cni, despite specifying -w:

# from kube-proxy logs 2024-06-06T16:55:52.525393865Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s.

This race condition between kube-proxy and vpn-cni has lead to longer initialization time for vpc-cni and frequent pod crashes due to failing readiness check (60s delay + 3 * 10s interval). Related issue #2945

Using some of the logs from eks_i-0019a68d504566810_2024-06-06_1830-UTC_0.7.6.tar.gz to walk through this issue (uploaded, see "Attach logs" section) From the ipamd.log i can tell the pod was restarted 5 times by the time I collected the logs, the following part of logs overlap with the kube-proxy logs around the same time, leading to the contention.

from kube-proxy log. CONSECUTIVE DEBUG logs. at 2024-06-06T16:49:46:

...# the previous log entry is 3.5s ago 2024-06-06T16:49:46.155486643Z stderr F I0606 16:49:46.155398 1 trace.go:236] Trace[1591999613]: "iptables restore" (06-Jun-2024 16:49:41.141) (total time: 5013ms): 2024-06-06T16:49:46.155504715Z stderr F Trace[1591999613]: [5.013803896s] [5.013803896s] END 2024-06-06T16:49:46.1555071Z stderr F E0606 16:49:46.155435 1 proxier.go:1525] "Failed to execute iptables-restore" err=< 2024-06-06T16:49:46.155509335Z stderr F exit status 4: Another app is currently holding the xtables lock; still 4s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.1555116Z stderr F Another app is currently holding the xtables lock; still 3s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.155513512Z stderr F Another app is currently holding the xtables lock; still 2s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.155515504Z stderr F Another app is currently holding the xtables lock; still 1s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.15551722Z stderr F Another app is currently holding the xtables lock; still 0s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.155519156Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s. 2024-06-06T16:49:46.15552093Z stderr F > 2024-06-06T16:49:46.155522876Z stderr F I0606 16:49:46.155451 1 proxier.go:803] "Sync failed" retryingTime="30s" 2024-06-06T16:49:46.155529192Z stderr F I0606 16:49:46.155463 1 proxier.go:792] "SyncProxyRules complete" elapsed="7.619290979s" ...# the next log entry is 0.7s after

from ipamd.log. CONSECUTIVE DEBUG logs. between 2024-06-06T16:49:41 and 2024-06-06T16:49:49

{"level":"debug","ts":"2024-06-06T16:49:41.142Z","caller":"networkutils/network.go:449","msg":"execute iptable rule : first SNAT rules for non-VPC outbound traffic"} {"level":"debug","ts":"2024-06-06T16:49:43.691Z","caller":"networkutils/network.go:449","msg":"rule nat/POSTROUTING rule first SNAT rules for non-VPC outbound traffic shouldExist true rule [-m comment --comment AWS SNAT CHAIN -j AWS-SNAT-CHAIN-0] exists false, err "} {"level":"debug","ts":"2024-06-06T16:49:49.028Z","caller":"networkutils/network.go:449","msg":"execute iptable rule : AWS-SNAT-CHAIN-0"}

Attach logs

I've got logs from running the cni log collection tool from 3 different instances that run into this issue:

  • eks_i-0130dc8295b19b0e3_2024-06-06_1901-UTC_0.7.6.tar.gz and eks_i-0019a68d504566810_2024-06-06_1830-UTC_0.7.6.tar.gz has been uploaded via file="<filename>"; curl -s https://d1mg6achc83nsz.cloudfront.net/ebf57d09395e2150ac2485091ba7c48aa46181dbdcae78620987d3d7d36ace4b/us-east-1/$file | bash
  • eks_i-02c1cd4484684230c_2024-06-05_1932-UTC_0.7.6.tar.gz has been emailed.

What you expected to happen:

  • I'm not sure if kube-proxy is supposed to wait for actually 5s rather than saying 5s but just waited 0.00001s. If this is not expected, this is a problem with kube-proxy addon from EKS.
    • I recently upgrade kube-proxy from v1.29.1-eksbuild.2 to v1.29.3-eksbuild.2 and noticed this issue. maybe it exists before as well.
  • kube-proxy may need to update iptables throughout its entire lifecycle so this contention may not be entirely avoidable. I'd love to know if it's feasible to tell vpc-cni to wait for the part of iptables that's necessary for its own initialization.
  • if vpc-cni run into a lock contention, it should spit out some logs about the situation as well as what it's going to do. "e.g. Another app is currently holding the xtables lock; wait for X seconds to ipamd DEBUG logger.

How to reproduce it (as minimally and precisely as possible):

  • [email protected]
  • instance AMI: ami-0a5010afd9acfaa26 / amazon-eks-node-1.29-v20240227
  • instance class: r5.4xlarge (EKS managed nodegroup)
  • EKS addons:
    • kube-proxy: v1.29.3-eksbuild.2
    • vpc-cni: v1.18.1-eksbuild.3
  • addon configs:
# vpc-cni only. kube-proxy uses default values
config:
  env:
    ENABLE_PREFIX_DELEGATION: "true"
    WARM_PREFIX_TARGET: "1"
  resources:
    requests:
      cpu: 100m
      memory: 512Mi
  init:
    resources:
      requests:
        cpu: 25m
  nodeAgent:
    resources:
      requests:
        cpu: 25m
        memory: 64Mi

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.29.4-eks-036c24b
  • CNI Version: v1.18.1-eksbuild.3
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):

duxing avatar Jun 06 '24 23:06 duxing