istio icon indicating copy to clipboard operation
istio copied to clipboard

Ambient mode CNI causes nftables failures in nft list/delete operations - NULL pointer dereferences in kernel

Open laurentdina opened this issue 2 weeks ago • 14 comments

deploy-cluster-istio-nftables-test.sh

Is this the right place to submit this?

  • [x] This is not a security vulnerability or a crashing bug
  • [x] This is not a question about how to use Istio

Bug Description

Bug Description

When Istio CNI runs in ambient mode with nftables enabled on AKS running Azure Linux 3 or Ubuntu 22.04 (AKS), we observe two critical issues:

  1. nft operations such as nft list and nft delete on the host fail with segmentation faults shortly after the CNI starts. This persists even after Istio CNI is stopped, suggesting potential interaction issues between the knftables library and the kernel's nf_tables implementation.
  2. Pod deletion failures: When pods are removed from the mesh, the CNI enters an infinite retry loop attempting to list nftables set elements (using JSON output). The retry count continuously increases, and cleanup never succeeds until the CNI pod is restarted.

It's currently unclear whether these two issues are directly related. They could be both symptoms of an underlying kernel state corruption triggered by how the CNI interacts with nftables.

Environment

Istio Version: 1.28.1 Kubernetes: AKS 1.32 Network Policy Engine: Azure Network Policy Manager (NPM) OS: AzureLinux 3 (CBL-Mariner 3.0) / AKS Ubuntu 22.04 Kernel: 6.6.104.2-4.azl3 (AZL3) / 5.15.0-1098-azure (Ubuntu) nftables version: 1.0.9 ("Old Doc Yak 3") (AZL3) / nftables v1.0.2 (Lester Gooch) (Ubuntu)

Version

$ istioctl version
client version: 1.28.1
control plane version: 1.28.1
data plane version: 1.28.1 (2 proxies)
$ kubectl version
Client Version: v1.33.4
Kustomize Version: v5.6.0
Server Version: v1.32.9
$helm version
version.BuildInfo{Version:"v3.18.4", GitCommit:"d80839cf37d860c8aa9a0503fe463278f26cd5e2", GitTreeState:"clean", GoVersion:"go1.24.4"}

Additional Information

Steps to Reproduce

  1. Deploy Istio 1.28.1 in ambient mode with nftables
  2. Istio CNI pod starts and successfully creates nftables rules (logs show "nftables rules programmed")
  3. Attempt any nft command related to existing entries on the host nft --json list tables nft delete table inet test-table
  4. Delete a pod that is part of the ambient mesh

Expected Behavior

  1. nft commands should execute successfully and list/delete the nftables created by Istio CNI.
  2. Pod deletions should complete successfully, with the CNI cleanly removing the pod's IP from the nftables set.

Actual Behavior

nft commands that query or modify existing structures result in segmentation faults:

$ nft --json list tables
Segmentation fault (core dumped)
$ nft add table inet test-table
$ nft add set inet test-table test-set "{ type ipv4_addr ; }"
$ nft delete table inet test-table
Segmentation fault (core dumped)

Kernel logs (dmesg) show NULL pointer dereferences:

nft[316298]: segfault at 0 ip 0000000000000000 sp 00007fff61a07218 error 14 in nft[56f97d757000+2000]
Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Istio CNI startup logs show successful rule creation followed by errors:

2025-12-04T08:38:51.058444Z     info    nftables rules programmed:
add table inet istio-ambient-nat
add set inet istio-ambient-nat istio-inpod-probes-v4 { type ipv4_addr ; comment "UUID of the pods that are part of ambient mesh" ; }

2025-12-04T08:38:51.098186Z     info    nftables rules programmed:
flush set inet istio-ambient-nat istio-inpod-probes-v4

2025-12-04T08:38:51.139054Z     info    nftables rules programmed:
add table inet istio-ambient-nat
flush table inet istio-ambient-nat
add chain inet istio-ambient-nat postrouting { type nat hook postrouting priority 100 ; }
add rule inet istio-ambient-nat postrouting meta l4proto tcp skuid 0 ip daddr @istio-inpod-probes-v4 counter snat to 169.254.7.127

2025-12-04T08:38:51.274441Z     warn    cni-agent       unable to list addressSet: failed to list elements from IPv4 set istio-inpod-probes-v4: failed to run nft: Error: JSON support not compiled-in

2025-12-04T08:38:51.274578Z     error   cni-agent       failed to sync host addressSet: failed to list elements from IPv4 set istio-inpod-probes-v4: failed to run nft: Error: JSON support not compiled-in

Logs when deleting a pod:

2025-12-01T16:55:15.391485Z    info    cni-agent    removing pod from mesh    ns=stress-test-ns-103 name=dummy-2 deleted=true
2025-12-01T16:55:15.431253Z    error    cni-agent    failed to remove pod dummy-2 from host addressSet, error was: failed to list elements from set istio-inpod-probes-v4: failed to run nft: Error: JSON support not compiled-in
    ns=stress-test-ns-103 name=dummy-2
2025-12-01T16:55:15.431349Z    warn    cni-agent    Unable to send pod to ztunnel for removal. Will retry. RemovePodFrmMesh returned: failed to list elements from set istio-inpod-probes-v4: failed to run nft: Error: JSON support not compiled-in
    ns=stress-test-ns-103 name=dummy-2
2025-12-01T16:55:15.431392Z    error    controllers    error handling stress-test-ns-103/dummy-2, retrying (retry count: 15): failed to list elements from set istio-inpod-probes-v4: failed to run nft: Error: JSON support not compiled-in
    controller=ambient

However, mesh functionality actually works for pod creation. This issue seems to only affect pod deletion.

Note: The host's nft binary DOES have JSON support (nft --help shows -j, --json option), so the "JSON support not compiled-in" error appears to occur after the system is already in a corrupted state.

We appreciate any insights you can provide on this issue. A script to deploy our test setup was appended. We're happy to provide additional diagnostics, test patches, or help in any way to resolve this. Thank you!

Configuration

ISTIO_VERSION="1.28.1"`
# Install Istio base
helm upgrade --install istio-base istio/base \
  --namespace istio-system \
  --version $ISTIO_VERSION \
  --wait
# Install Istio CNI
helm upgrade --install istio-cni istio/cni \
  --version $ISTIO_VERSION \
  --namespace istio-system \
  --set cni.chained=true \
  --set cni.privileged=true \
  --set cni.ambient.enabled=true \
  --set cni.ambient.ipv6=false \
  --set cni.ambient.dnsCapture=false \
  --set global.hub=docker.io/istio \
  --set global.variant=distroless \
  --set global.nativeNftables=true \
  --wait
# Install Istiod
helm upgrade --install istiod istio/istiod \
  --version $ISTIO_VERSION \
  --namespace istio-system \
  --set profile=ambient \
  --set global.hub=docker.io/istio \
  --set global.variant=distroless \
  --set global.nativeNftables=true \
  --set replicaCount=2 \
  --set autoscaleEnabled=true \
  --set autoscaleMin=2 \
  --set autoscaleMax=5 \
  --set pilot.cni.enabled=true \
  --set pilot.env.PILOT_ENABLE_AMBIENT=true \
  --set "pilot.env.CA_TRUSTED_NODE_ACCOUNTS=istio-system/ztunnel" \
  --set pilot.env.PILOT_ENABLE_AMBIENT_CONTROLLERS=true \
  --set pilot.env.PILOT_ENABLE_IP_AUTOALLOCATE=false \
  --set meshConfig.enablePrometheusMerge=true \
  --set meshConfig.defaultConfig.proxyMetadata.ISTIO_META_ENABLE_HBONE=true \
  --wait
# Deploy ztunnel for nodegroups
ZTUNNEL_IMAGE="docker.io/istio/ztunnel:${ISTIO_VERSION}-distroless}"
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ztunnel-any
namespace: istio-system
labels:
  app: ztunnel-any
spec:
selector:
  matchLabels:
    app: ztunnel-any
template:
  metadata:
    labels:
      app: ztunnel-any
      istio.io/dataplane-mode: "none"
      sidecar.istio.io/inject: "false"
    annotations:
      prometheus.io/port: "15020"
      prometheus.io/scrape: "true"
      sidecar.istio.io/inject: "false"
  spec:
    serviceAccountName: ztunnel
    nodeSelector:
      nodegroup.example.com/env: "any"
      kubernetes.io/os: "linux"
    priorityClassName: system-node-critical
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    terminationGracePeriodSeconds: 30
    tolerations:
      - effect: NoSchedule
        operator: Exists
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
    containers:
      - name: istio-proxy
        image: ${ZTUNNEL_IMAGE}
        imagePullPolicy: IfNotPresent
        args: ["proxy", "ztunnel"]
        ports:
          - containerPort: 15020
            name: ztunnel-stats
            protocol: TCP
        env:
          - name: CA_ADDRESS
            value: istiod.istio-system.svc:15012
          - name: XDS_ADDRESS
            value: istiod.istio-system.svc:15012
          - name: RUST_LOG
            value: info
          - name: ISTIO_META_CLUSTER_ID
            value: Kubernetes
          - name: INPOD_ENABLED
            value: "true"
          - name: ISTIO_META_DNS_PROXY_ADDR
            value: "127.0.0.1:15053"
          - name: ISTIO_META_ENABLE_HBONE
            value: "true"
          - name: ISTIO_META_IPTABLES_MODE
            value: "nft"
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          - name: INSTANCE_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: SERVICE_ACCOUNT
            valueFrom:
              fieldRef:
                fieldPath: spec.serviceAccountName
        readinessProbe:
          httpGet:
            path: /healthz/ready
            port: 15021
          periodSeconds: 10
        resources:
          requests:
            cpu: 200m
            memory: 512Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add: ["NET_ADMIN", "SYS_ADMIN", "NET_RAW"]
            drop: ["ALL"]
          privileged: false
          readOnlyRootFilesystem: true
          runAsGroup: 1337
          runAsNonRoot: false
          runAsUser: 0
        volumeMounts:
          - name: istiod-ca-cert
            mountPath: /var/run/secrets/istio
          - name: istio-token
            mountPath: /var/run/secrets/tokens
          - name: cni-ztunnel-sock-dir
            mountPath: /var/run/ztunnel
          - name: tmp
            mountPath: /tmp
    volumes:
      - name: istio-token
        projected:
          sources:
            - serviceAccountToken:
                audience: istio-ca
                expirationSeconds: 43200
                path: istio-token
      - name: istiod-ca-cert
        configMap:
          name: istio-ca-root-cert
      - name: cni-ztunnel-sock-dir
        hostPath:
          path: /var/run/ztunnel
          type: DirectoryOrCreate
      - name: tmp
        emptyDir: {}
EOF

laurentdina avatar Dec 04 '25 12:12 laurentdina

Thanks @laurentdina for reporting the issue. On a fresh node (before installing Istio), if you run the following nft commands, does it still lead to Segmentation fault?

$ nft add table inet test-table
$ nft add set inet test-table test-set "{ type ipv4_addr ; }"
$ nft delete table inet test-table
Segmentation fault (core dumped)

sridhargaddam avatar Dec 04 '25 13:12 sridhargaddam

@sridhargaddam

On a fresh node (before installing Istio), it does not lead to segmentation faults:

$ nft add table inet test-table
$ nft add set inet test-table test-set "{ type ipv4_addr ; }"
$ nft -j list table inet test-table
{"nftables": [{"metainfo": {"version": "1.0.9", "release_name": "Old Doc Yak #3", "json_schema_version": 1}}, {"table": {"family": "inet", "name": "test-table", "handle": 24}}, {"set": {"family": "inet", "name": "test-set", "table": "test-table", "type": "ipv4_addr", "handle": 1}}]}
$ nft delete table inet test-table
$ nft -j list table inet test-table
Error: No such file or directory

Segmentation faults occur immediately after deploying Istio CNI on that node.

helm upgrade --install istio-cni istio/cni \
  --version $ISTIO_VERSION \
  --namespace istio-system \
  --set cni.chained=true \
  --set cni.privileged=true \
  --set cni.ambient.enabled=true \
  --set cni.ambient.ipv6=false \
  --set cni.ambient.dnsCapture=false \
  --set global.hub=docker.io/istio \
  --set global.variant=distroless \
  --set global.nativeNftables=true \
  --wait
$ nft add table inet test-table
$ nft add set inet test-table test-set "{ type ipv4_addr ; }"
$ nft delete table inet test-table
Segmentation fault (core dumped)

laurentdina avatar Dec 04 '25 14:12 laurentdina

I was able to reproduce this issue with the Kind cluster as well. Will update if I find any pointers...

sridhargaddam avatar Dec 07 '25 13:12 sridhargaddam

When using the debug variant of the IstioCNI image, I'm unable to reproduce the issue on Kind. Do you observe the same behavior on your platform?

helm upgrade --install istio-cni istio/cni \
 --version $ISTIO_VERSION \
 --namespace istio-system \
 --set cni.chained=true \
 --set cni.privileged=true \
 --set cni.ambient.enabled=true \
 --set cni.ambient.ipv6=false \
 --set cni.ambient.dnsCapture=false \
 --set global.hub=docker.io/istio \
 --set global.variant=debug \
 --set global.nativeNftables=true \
 --wait

sridhargaddam avatar Dec 08 '25 13:12 sridhargaddam

Looping in @yxun, who worked on the distroless nft image.

Yuanlin, the nft binary included in the current distroless image is version 1.1.5. It looks like there are some breaking changes between 1.0.6/1.0.9 and 1.1.5, which might be causing the segfault. Could you please try using an older nft version with the distroless images and let us know what you find?

sridhargaddam avatar Dec 08 '25 18:12 sridhargaddam

Hello, I worked on an nftables package change in the Istio distroless base image before. The distroless nftables package version had been updated several times from wolfi-dev side. Reference: https://github.com/wolfi-dev/os/blob/main/nftables.yaml

I tried a local build of nftables package 1.0.6 and then included it in the Istio distroless install-cni image. The issue was not there when using 1.0.6. So the issue was caused by the version incompatibilities between the host nft package and the CNI node agent distroless image one. I am checking the package change log.

yxun avatar Dec 08 '25 18:12 yxun

The Istio distroless nftables package missing JSON support is a separate issue. That was fixed from the wolfi-dev side nftables.yaml recently and I will create a PR to include that fix for the nftables-slim sub-package there.

The list/delete operation segmentation fault is caused by version incompatibilities. It's not clear which version gap was causing that.

yxun avatar Dec 08 '25 18:12 yxun

Does that mean any user who upgrades will run into this? That feels like a critical bug; can we pin the nftables-slim binary to an older non-buggy version

keithmattix avatar Dec 08 '25 20:12 keithmattix

This list/delete segmentation fault incompatibilities issue was caused by a change between nftables package version 1.1.1 and version 1.1.2. ( The issue happened when I tried version 1.1.2 and version 1.1.1 works fine).

The apko tool does support a feature like pining a package version such as "- nftables-slim=1.1.1-r0". However, there is no version 1.1.1 slim build from wolfi apk packages so far. See my build log below. The nftables-slim package starts from version 1.1.4 and there is a nftables 1.1.1-r40 package.

I am going to check and ask wolfi-dev if there is way to include a 1.1.1 package there and then we can update the istio/istio repo docker/iptables.yaml with - nftables-slim=1.1.1-r0 under the packages.

$ apko publish --arch=x86_64 ./iptables.yaml.test quay.io/yuaxu/nftables:1.1.1-slim
Error: failed to build image components: locking config: resolving apk packages: for arch "amd64": solving "nftables-slim=1.1.1-r0" constraint:   nftables-slim-1.1.3-r0.apk disqualified because "1.1.3-r0" does not satisfy "nftables-slim=1.1.1-r0"
  nftables-slim-1.1.4-r1.apk disqualified because "1.1.4-r1" does not satisfy "nftables-slim=1.1.1-r0"
  nftables-slim-1.1.5-r0.apk disqualified because "1.1.5-r0" does not satisfy "nftables-slim=1.1.1-r0"
  nftables-slim-1.1.5-r1.apk disqualified because "1.1.5-r1" does not satisfy "nftables-slim=1.1.1-r0"
  nftables-slim-1.1.5-r2.apk disqualified because "1.1.5-r2" does not satisfy "nftables-slim=1.1.1-r0"
  nftables-slim-1.1.6-r0.apk disqualified because "1.1.6-r0" does not satisfy "nftables-slim=1.1.1-r0"
$ apko publish --arch=x86_64 ./iptables.yaml.test quay.io/yuaxu/nftables:1.1.1-not-slim
Error: failed to build image components: locking config: resolving apk packages: for arch "amd64": solving "nftables=1.1.1-r0" constraint:   nftables-1.1.1-r40.apk disqualified because "1.1.1-r40" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.2-r0.apk disqualified because "1.1.2-r0" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.3-r0.apk disqualified because "1.1.3-r0" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.3-r1.apk disqualified because "1.1.3-r1" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.4-r0.apk disqualified because "1.1.4-r0" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.4-r1.apk disqualified because "1.1.4-r1" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.5-r0.apk disqualified because "1.1.5-r0" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.5-r1.apk disqualified because "1.1.5-r1" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.5-r2.apk disqualified because "1.1.5-r2" does not satisfy "nftables=1.1.1-r0"
  nftables-1.1.6-r0.apk disqualified because "1.1.6-r0" does not satisfy "nftables=1.1.1-r0"

yxun avatar Dec 09 '25 02:12 yxun

I updated Dockerfile.base to install nft 1.1.5 instead of the distro’s 1.0.6. With that change, I can reproduce the issue even on a Kind cluster with the custom debug image. This suggests something in 1.1.2 introduced a breaking change, as Yuanlin mentioned. When IstioCNI runs with 1.1.2 or later while the host uses an older version like 1.0.9, the nft commands on the host hit a segmentation fault. IstioCNI itself keeps running without errors, and I don’t see any segfaults in its logs, but the fact that host-side nft commands crash is concerning.

sridhargaddam avatar Dec 09 '25 09:12 sridhargaddam

@sridhargaddam I'm a bit late since the issues were already identified here, but I can confirm the debug variant works correctly on our platform. All nft list/delete operations succeed without segmentation faults, JSON support functions properly, and pod removal from the mesh completes successfully without the "JSON support not compiled-in" errors.

laurentdina avatar Dec 09 '25 09:12 laurentdina

The nftables-slim JSON output support PR is pending for review: https://github.com/wolfi-dev/os/pull/75259 That will fix the Error: JSON support not compiled-in error.

We are still missing an older nftables-slim version 1.1.1 package build. I asked that in the apko Slack channel: https://kubernetes.slack.com/archives/C03MASRQP4M/p1765248959082409 If we install the 1.1.1-rc40 nftables full package in distroless base image, it will cause Istio side distroless base image unexpectedFiles errors.

yxun avatar Dec 09 '25 16:12 yxun

@yxun Do you know of any risks in pinning the version to 1.1.1 (an older version)? Are there any critical fixes we'd be missing?

jaellio avatar Dec 09 '25 21:12 jaellio

Hello @jaellio , I don't see a risk of pinning version 1.1.1 or version 1.0.9 (an older version) if we can get that version nftables-slim package in the wolfi package repo. When we use a version 1.1.1+ in the IstioCNI distroless base image, we have an incompatibility issue with an existing version 1.0.x nftables package on the host.

There are fixes in each nftables 1.1.x version. I am not familiar with those changes.

Because the host machine nftables package is running version 1.0.9 from this issue comment, the risk will be the same between the package running on the host and the one in the IstioCNI container.

yxun avatar Dec 09 '25 22:12 yxun