Pod connectivity issues on Kubernetes 1.34 (EKS) with kube-proxy in `nftables` mode and aws-node NetworkPolicy enabled
Image I'm using:
- Bottlerocket v1.49.0
- amazon-k8s-cni:v1.20.4-eksbuild.1
- aws-network-policy-agent:v1.2.7-eksbuild.1
- kube-proxy:v1.34.0-eksbuild.4
What I expected to happen:
Keep pod connectivity.
What actually happened:
After upgrading to Kubernetes 1.34 and restarting nodes, some pods are facing connectivity issues on 5 out of 8 clusters.
For example, coredns sometimes fails to reach apiserver. It fails to start with this error:
[ERROR] plugin/kubernetes: Failed to watch on the same host
kubectl -n kube-system debug -it coredns-67cb468c85-k8txh --image=alpine --target=coredns -- sh
apk add --no-cache curl
curl -k -sS https://172.20.0.1:443/readyz?verbose
[stuck]
- When connecting on the node hosting the coredns pod, the apiserver service is reachable.
- bottlerocket, aws-node and kube-proxy properly uses
iptables-nftables(not the legacy variant)
$ kubectl -n kube-system exec aws-node-rl2kp -c aws-node -- iptables -V
iptables v1.8.4 (nf_tables)
Workaround
- Deleting the pod temporarily fixes the issue (until a new pod has connectivity issues)
To fix the cluster:
- Switch kube-proxy to
iptablesmode fixes the issue on the cluster - Stick to Kubernetes 1.33
- Disable Network Policy agent of aws-vpc-cni
How to reproduce the problem:
Not sure yet
Hi @gnuletik ,
Thanks for cutting us this issue. I attempted reproducing this issue but I did not reproduce the issue you mentioned.
My setup
Using bottlerocket OS 1.49.0 k8s 1.34 variant
fedora@ip-10-0-0-221 ~/bottlerocket-core-kit (develop)> k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-192-168-142-237.us-west-2.compute.internal Ready <none> 2d6h v1.34.0-eks-642f211 192.168.142.237 <none> Bottlerocket OS 1.49.0 (aws-k8s-1.34) 6.12.46 containerd://2.1.4+bottlerocket
ip-192-168-167-204.us-west-2.compute.internal Ready <none> 2d7h v1.34.0-eks-642f211 192.168.167.204 <none> Bottlerocket OS 1.49.0 (aws-k8s-1.34) 6.12.46 containerd://2.1.4+bottlerocket
Kube-proxy using nfttables mode
Manually configured the kube-proxy add-on with configuration:
{
"mode": "nftables"
}
And confirmed that the kube-proxy config picked it up.
kubectl get configmap -n kube-system kube-proxy-config -o yaml | grep -A 2 -B 2 "mode:"
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
mode: "nftables"
nodePortAddresses: null
oomScoreAdj: -998
Enable NetworkPolicy in my aws-node add-on
Manually configure the add-on with
{"enableNetworkPolicy": "true"}
And confirmed from the aws-network-policy-agent pod that it enables the network-policy
kubectl describe pod -n kube-system aws-node-gvfs9 | grep -A 10 "aws-network-policy-agent:"
Image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-network-policy-agent:v1.2.7-eksbuild.1
Image ID: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-network-policy-agent@sha256:f99fb1fea5e16dc3a2429ddd0a2660d0f3b4ba40b467e81e1898b001ee54c240
Port: 8162/TCP (agentmetrics)
Host Port: 8162/TCP (agentmetrics)
Args:
--enable-ipv6=false
--enable-network-policy=true <========= enabled here
--enable-cloudwatch-logs=false
--enable-policy-event-logs=false
Test connectivity to APIServer serivce
Using the same command and ping the APIService readyz endpoint and had no issue.
/ # curl -k -sS https://10.100.0.1:443/readyz?verbose
[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[+]kms-providers ok
[+]informer-sync ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/start-system-namespaces-controller ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/start-service-ip-repair-controllers ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/start-kubernetes-service-cidr-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-status-local-available-controller ok
[+]poststarthook/apiservice-status-remote-available-controller ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-discovery-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check passed
Confirm iptables-nft variant is being used
kubectl -n kube-system exec aws-node-gvfs9 -c aws-node -- iptables -V
iptables v1.8.4 (nf_tables)
Questions:
Could you please share a bit more information about your setup?
- Based on your description, it does not seem to be happening in a deterministic way? Or are you able to consistently reproduce it?
- Is there any other network relevant component used in your workloads other than aws-node cni, kube-proxy and core-dns? If yes, what configuration did you use.
- Is there anything else you configured in your network policy, if yes could you provide the config?
- If you have premium support, feel free to engage AWS support so that you can provide more details about your nodes and workloads.
I would like to join in on this discussion, we had the same issues. Rollback to 1.48 seems to solve it. Suddenly, some containers were stuck in container creating with a 'pulling image' from our local registry.
Thanks for investigating the issue and trying to reproduce @ytsssun! And thanks for sharing too @vincentjanv, I'm happy to see the issue happening somewhere else!
- Based on your description, it does not seem to be happening in a deterministic way? Or are you able to consistently reproduce it?
I'm not able to consistently reproduce it. I'm still unable to pinpoint why some clusters didn't face the issue while others works normally (the clusters without issue have been running for the past week without any connectivity issue). I've tried to take some config from the working cluster to the non-working clusters but the issue always come back after ~1 day.
Is there any other network relevant component used in your workloads other than aws-node cni, kube-proxy and core-dns? If yes, what configuration did you use.
We don't have other network components running. We are using karpenter, secrets-store-csi-driver, nginx-ingress-controller, bottlrocket-update-operator but I don't think that they can impact pod connectivity.
Is there anything else you configured in your network policy, if yes could you provide the config?
We don't have anything else configured in our network policies.
If you have premium support, feel free to engage AWS support so that you can provide more details about your nodes and workloads.
We don't have any account with premium support right now. As we are fine with iptables mode right now, we don't have plan to invest in switching to nftables mode.
any update on this? facing the same 😕
@avnerv were you able to pinpoint a configuration that would help maintainers reproduce the issue?
Not sure if this is the root cause but Kubernetes 1.34.2 changelog includes a kube-proxy fix for nftable mode:
- Fixed a bug in kube-proxy nftables mode (GA as of 1.33) that fails to determine if traffic originates from a local source on the node. The issue was caused by using the wrong meta
iifinstead ofiifnamefor name based matches. (#134118) [SIG Network]
@gnuletik curious if you have seen improvement after using the latest kube-proxy?
@KCSesh I didn't tried because EKS add-on don't offer version v1.34.2 yet.