amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
iptables contention between vpc-cni and kube-proxy
What happened:
on an EKS cluster with many Service
s (1000 in my case) and many pods (300 pods), big iptable lead to long execution time for some iptable
rules (5s+ seconds)
this leads to xtable contention between kube-proxy
and vpc-cni
, despite specifying -w
:
# from kube-proxy logs 2024-06-06T16:55:52.525393865Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s.
This race condition between kube-proxy
and vpn-cni
has lead to longer initialization time for vpc-cni
and frequent pod crashes due to failing readiness check (60s delay + 3 * 10s interval). Related issue #2945
Using some of the logs from eks_i-0019a68d504566810_2024-06-06_1830-UTC_0.7.6.tar.gz
to walk through this issue (uploaded, see "Attach logs" section)
From the ipamd.log
i can tell the pod was restarted 5 times by the time I collected the logs, the following part of logs overlap with the kube-proxy
logs around the same time, leading to the contention.
from kube-proxy
log. CONSECUTIVE DEBUG logs. at 2024-06-06T16:49:46:
...# the previous log entry is 3.5s ago 2024-06-06T16:49:46.155486643Z stderr F I0606 16:49:46.155398 1 trace.go:236] Trace[1591999613]: "iptables restore" (06-Jun-2024 16:49:41.141) (total time: 5013ms): 2024-06-06T16:49:46.155504715Z stderr F Trace[1591999613]: [5.013803896s] [5.013803896s] END 2024-06-06T16:49:46.1555071Z stderr F E0606 16:49:46.155435 1 proxier.go:1525] "Failed to execute iptables-restore" err=< 2024-06-06T16:49:46.155509335Z stderr F exit status 4: Another app is currently holding the xtables lock; still 4s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.1555116Z stderr F Another app is currently holding the xtables lock; still 3s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.155513512Z stderr F Another app is currently holding the xtables lock; still 2s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.155515504Z stderr F Another app is currently holding the xtables lock; still 1s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.15551722Z stderr F Another app is currently holding the xtables lock; still 0s 100000us time ahead to have a chance to grab the lock... 2024-06-06T16:49:46.155519156Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s. 2024-06-06T16:49:46.15552093Z stderr F > 2024-06-06T16:49:46.155522876Z stderr F I0606 16:49:46.155451 1 proxier.go:803] "Sync failed" retryingTime="30s" 2024-06-06T16:49:46.155529192Z stderr F I0606 16:49:46.155463 1 proxier.go:792] "SyncProxyRules complete" elapsed="7.619290979s" ...# the next log entry is 0.7s after
from ipamd.log
. CONSECUTIVE DEBUG logs. between 2024-06-06T16:49:41 and 2024-06-06T16:49:49
{"level":"debug","ts":"2024-06-06T16:49:41.142Z","caller":"networkutils/network.go:449","msg":"execute iptable rule : first SNAT rules for non-VPC outbound traffic"} {"level":"debug","ts":"2024-06-06T16:49:43.691Z","caller":"networkutils/network.go:449","msg":"rule nat/POSTROUTING rule first SNAT rules for non-VPC outbound traffic shouldExist true rule [-m comment --comment AWS SNAT CHAIN -j AWS-SNAT-CHAIN-0] exists false, err
"} {"level":"debug","ts":"2024-06-06T16:49:49.028Z","caller":"networkutils/network.go:449","msg":"execute iptable rule : AWS-SNAT-CHAIN-0"}
Attach logs
I've got logs from running the cni log collection tool from 3 different instances that run into this issue:
-
eks_i-0130dc8295b19b0e3_2024-06-06_1901-UTC_0.7.6.tar.gz
andeks_i-0019a68d504566810_2024-06-06_1830-UTC_0.7.6.tar.gz
has been uploaded viafile="<filename>"; curl -s https://d1mg6achc83nsz.cloudfront.net/ebf57d09395e2150ac2485091ba7c48aa46181dbdcae78620987d3d7d36ace4b/us-east-1/$file | bash
-
eks_i-02c1cd4484684230c_2024-06-05_1932-UTC_0.7.6.tar.gz
has been emailed.
What you expected to happen:
- I'm not sure if
kube-proxy
is supposed to wait for actually5s
rather than saying5s
but just waited0.00001s
. If this is not expected, this is a problem withkube-proxy
addon from EKS.- I recently upgrade
kube-proxy
fromv1.29.1-eksbuild.2
tov1.29.3-eksbuild.2
and noticed this issue. maybe it exists before as well.
- I recently upgrade
-
kube-proxy
may need to updateiptables
throughout its entire lifecycle so this contention may not be entirely avoidable. I'd love to know if it's feasible to tellvpc-cni
to wait for the part ofiptables
that's necessary for its own initialization. - if
vpc-cni
run into a lock contention, it should spit out some logs about the situation as well as what it's going to do. "e.g.Another app is currently holding the xtables lock; wait for X seconds
toipamd
DEBUG
logger.
How to reproduce it (as minimally and precisely as possible):
-
[email protected]
- instance AMI:
ami-0a5010afd9acfaa26
/amazon-eks-node-1.29-v20240227
- instance class:
r5.4xlarge
(EKS managed nodegroup) - EKS addons:
-
kube-proxy
:v1.29.3-eksbuild.2
-
vpc-cni
:v1.18.1-eksbuild.3
-
- addon configs:
# vpc-cni only. kube-proxy uses default values
config:
env:
ENABLE_PREFIX_DELEGATION: "true"
WARM_PREFIX_TARGET: "1"
resources:
requests:
cpu: 100m
memory: 512Mi
init:
resources:
requests:
cpu: 25m
nodeAgent:
resources:
requests:
cpu: 25m
memory: 64Mi
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):v1.29.4-eks-036c24b
- CNI Version:
v1.18.1-eksbuild.3
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
):