bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Pod connectivity issues on Kubernetes 1.34 (EKS) with kube-proxy in `nftables` mode and aws-node NetworkPolicy enabled

Open gnuletik opened this issue 1 month ago • 8 comments

Image I'm using:

  • Bottlerocket v1.49.0
  • amazon-k8s-cni:v1.20.4-eksbuild.1
  • aws-network-policy-agent:v1.2.7-eksbuild.1
  • kube-proxy:v1.34.0-eksbuild.4

What I expected to happen:

Keep pod connectivity.

What actually happened:

After upgrading to Kubernetes 1.34 and restarting nodes, some pods are facing connectivity issues on 5 out of 8 clusters.

For example, coredns sometimes fails to reach apiserver. It fails to start with this error:

[ERROR] plugin/kubernetes: Failed to watch on the same host
kubectl -n kube-system debug -it coredns-67cb468c85-k8txh  --image=alpine --target=coredns -- sh
apk add --no-cache curl
curl -k -sS https://172.20.0.1:443/readyz?verbose
[stuck]
  • When connecting on the node hosting the coredns pod, the apiserver service is reachable.
  • bottlerocket, aws-node and kube-proxy properly uses iptables-nftables (not the legacy variant)
$ kubectl -n kube-system exec aws-node-rl2kp -c aws-node -- iptables -V
iptables v1.8.4 (nf_tables)

Workaround

  • Deleting the pod temporarily fixes the issue (until a new pod has connectivity issues)

To fix the cluster:

  • Switch kube-proxy to iptables mode fixes the issue on the cluster
  • Stick to Kubernetes 1.33
  • Disable Network Policy agent of aws-vpc-cni

How to reproduce the problem:

Not sure yet

gnuletik avatar Oct 29 '25 14:10 gnuletik

Hi @gnuletik ,

Thanks for cutting us this issue. I attempted reproducing this issue but I did not reproduce the issue you mentioned.

My setup

Using bottlerocket OS 1.49.0 k8s 1.34 variant

fedora@ip-10-0-0-221 ~/bottlerocket-core-kit (develop)> k get nodes -o wide
NAME                                            STATUS   ROLES    AGE    VERSION               INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                                KERNEL-VERSION   CONTAINER-RUNTIME
ip-192-168-142-237.us-west-2.compute.internal   Ready    <none>   2d6h   v1.34.0-eks-642f211   192.168.142.237   <none>        Bottlerocket OS 1.49.0 (aws-k8s-1.34)   6.12.46          containerd://2.1.4+bottlerocket
ip-192-168-167-204.us-west-2.compute.internal   Ready    <none>   2d7h   v1.34.0-eks-642f211   192.168.167.204   <none>        Bottlerocket OS 1.49.0 (aws-k8s-1.34)   6.12.46          containerd://2.1.4+bottlerocket

Kube-proxy using nfttables mode

Manually configured the kube-proxy add-on with configuration:

{
    "mode": "nftables"
}

And confirmed that the kube-proxy config picked it up.

kubectl get configmap -n kube-system kube-proxy-config -o yaml | grep -A 2 -B 2 "mode:"
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0:10249
    mode: "nftables"
    nodePortAddresses: null
    oomScoreAdj: -998

Enable NetworkPolicy in my aws-node add-on

Manually configure the add-on with

{"enableNetworkPolicy": "true"}

And confirmed from the aws-network-policy-agent pod that it enables the network-policy

kubectl describe pod -n kube-system aws-node-gvfs9 | grep -A 10 "aws-network-policy-agent:"
    Image:         602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-network-policy-agent:v1.2.7-eksbuild.1
    Image ID:      602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-network-policy-agent@sha256:f99fb1fea5e16dc3a2429ddd0a2660d0f3b4ba40b467e81e1898b001ee54c240
    Port:          8162/TCP (agentmetrics)
    Host Port:     8162/TCP (agentmetrics)
    Args:
      --enable-ipv6=false
      --enable-network-policy=true        <========= enabled here
      --enable-cloudwatch-logs=false
      --enable-policy-event-logs=false

Test connectivity to APIServer serivce

Using the same command and ping the APIService readyz endpoint and had no issue.

/ # curl -k -sS https://10.100.0.1:443/readyz?verbose
[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[+]kms-providers ok
[+]informer-sync ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/start-system-namespaces-controller ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/start-service-ip-repair-controllers ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/start-kubernetes-service-cidr-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-status-local-available-controller ok
[+]poststarthook/apiservice-status-remote-available-controller ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-discovery-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check passed

Confirm iptables-nft variant is being used

kubectl -n kube-system exec aws-node-gvfs9 -c aws-node -- iptables -V
iptables v1.8.4 (nf_tables)

Questions:

Could you please share a bit more information about your setup?

  1. Based on your description, it does not seem to be happening in a deterministic way? Or are you able to consistently reproduce it?
  2. Is there any other network relevant component used in your workloads other than aws-node cni, kube-proxy and core-dns? If yes, what configuration did you use.
  3. Is there anything else you configured in your network policy, if yes could you provide the config?
  4. If you have premium support, feel free to engage AWS support so that you can provide more details about your nodes and workloads.

ytsssun avatar Nov 03 '25 06:11 ytsssun

I would like to join in on this discussion, we had the same issues. Rollback to 1.48 seems to solve it. Suddenly, some containers were stuck in container creating with a 'pulling image' from our local registry.

vincentjanv avatar Nov 03 '25 10:11 vincentjanv

Thanks for investigating the issue and trying to reproduce @ytsssun! And thanks for sharing too @vincentjanv, I'm happy to see the issue happening somewhere else!

  1. Based on your description, it does not seem to be happening in a deterministic way? Or are you able to consistently reproduce it?

I'm not able to consistently reproduce it. I'm still unable to pinpoint why some clusters didn't face the issue while others works normally (the clusters without issue have been running for the past week without any connectivity issue). I've tried to take some config from the working cluster to the non-working clusters but the issue always come back after ~1 day.

Is there any other network relevant component used in your workloads other than aws-node cni, kube-proxy and core-dns? If yes, what configuration did you use.

We don't have other network components running. We are using karpenter, secrets-store-csi-driver, nginx-ingress-controller, bottlrocket-update-operator but I don't think that they can impact pod connectivity.

Is there anything else you configured in your network policy, if yes could you provide the config?

We don't have anything else configured in our network policies.

If you have premium support, feel free to engage AWS support so that you can provide more details about your nodes and workloads.

We don't have any account with premium support right now. As we are fine with iptables mode right now, we don't have plan to invest in switching to nftables mode.

gnuletik avatar Nov 03 '25 11:11 gnuletik

any update on this? facing the same 😕

avnerv avatar Nov 09 '25 06:11 avnerv

@avnerv were you able to pinpoint a configuration that would help maintainers reproduce the issue?

gnuletik avatar Nov 10 '25 11:11 gnuletik

Not sure if this is the root cause but Kubernetes 1.34.2 changelog includes a kube-proxy fix for nftable mode:

  • Fixed a bug in kube-proxy nftables mode (GA as of 1.33) that fails to determine if traffic originates from a local source on the node. The issue was caused by using the wrong meta iif instead of iifname for name based matches. (#134118) [SIG Network]

gnuletik avatar Nov 13 '25 09:11 gnuletik

@gnuletik curious if you have seen improvement after using the latest kube-proxy?

KCSesh avatar Nov 25 '25 22:11 KCSesh

@KCSesh I didn't tried because EKS add-on don't offer version v1.34.2 yet.

Image

gnuletik avatar Nov 26 '25 09:11 gnuletik