dagger icon indicating copy to clipboard operation
dagger copied to clipboard

fix: support kernels without legacy xtables (6.17+)

Open shepherdjerred opened this issue 2 months ago • 1 comments

Summary

Fix Dagger Engine failing on Linux kernel 6.17+ where CONFIG_NETFILTER_XTABLES_LEGACY is disabled.

Image available here: https://github.com/users/shepherdjerred/packages/container/package/dagger-engine

2025-12-29T23:34:37.727Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.727Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.732Z | time="2025-12-29T23:34:37Z" level=info msg="auto snapshotter: using overlayfs"
2025-12-29T23:34:37.735Z | time="2025-12-29T23:34:37Z" level=warning msg="failed to release network namespace \"kxhj354vbn6ent9a2x1tp6s9d\" left over from previous run: plugin type=\"loopback\" failed (delete): unknown FS magic on \"/var/lib/dagger/net/cni/kxhj354vbn6ent9a2x1tp6s9d\": 2fc12fc1"
2025-12-29T23:34:37.767Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.767Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.794Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.794Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.824Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.824Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.852Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.852Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.878Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.878Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.905Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.905Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.931Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.931Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.959Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.959Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:37.988Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:37.988Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:38.017Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:38.017Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
2025-12-29T23:34:38.073Z | dnsmasq[739]: read /etc/hosts - 18 names
2025-12-29T23:34:38.073Z | dnsmasq[739]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
Run ARGS=(
1   : connect
1   : [0.0s] | cloud url=https://dagger.cloud/traces/setup
2   : ┆ starting engine
2   : ┆ starting engine DONE [0.0s]
3   : ┆ connecting to engine
3   : ┆ [0.1s] | 23:48:59 INF connected name=dagger-dagger-helm-engine-0 client-version=v0.19.8 server-version=v0.19.8
3   : ┆ connecting to engine DONE [0.1s]
4   : ┆ starting session
4   : ┆ starting session DONE [0.1s]
1   : connect DONE [0.3s]
5   : load module: .
6   : ┆ finding module configuration
7   : ┆ initializing module
6   : ┆ finding module configuration DONE [13.1s]

Tested on Talos Linux 1.12.0

Fixes #11607

Changes

  1. Switch from iptables to iptables-nft package in Wolfi base image (uses nftables kernel API)
  2. Add runtime detection for legacy xtables availability
  3. Configure CNI bridge plugin to use nftables backend when legacy unavailable

Technical Details

Background

Starting with Linux kernel 6.17, CONFIG_NETFILTER_XTABLES_LEGACY defaults to disabled, removing the legacy ip_tables kernel modules. This affects:

  • Talos Linux 1.12+ (kernel 6.18.1)
  • RHEL 10 (upcoming)
  • Arch Linux with kernel 6.17+
  • Void Linux with kernel 6.17+

Solution

Package Change (toolchains/engine-dev/build/builder.go):

  • Switch to iptables-nft and ip6tables-nft packages
  • These use the nftables kernel API (available since kernel 3.13)
  • The go-iptables library auto-detects nft mode from iptables -V output
  • Ensures CNI firewall plugin works on both old and new kernels

Runtime Detection (internal/buildkit/util/network/cniprovider/bridge.go):

  • New detectIPMasqBackend() function checks:
    1. /proc/net/ip_tables_names existence (legacy xtables indicator)
    2. Probes iptables -t nat -L for "Table does not exist" error
  • If legacy xtables unavailable, configures CNI bridge with ipMasqBackend: "nftables"
  • Debug-level logging when falling back to nftables

Testing

  • ✅ Code compiles and passes go vet
  • ⏳ Manual testing needed on Talos 1.12 (kernel 6.18.1)
  • ⏳ Regression testing needed on Docker Desktop / standard kernels

Related

  • Previous work: #7670 (reverted to iptables-legacy for different issue)
  • CNI bridge plugin docs: https://www.cni.dev/plugins/current/main/bridge/

shepherdjerred avatar Dec 29 '25 21:12 shepherdjerred

@shepherdjerred small nit about the DCO check please :pray:

marcosnils avatar Jan 03 '26 04:01 marcosnils

I believe I've fixed the DCO check. Thanks for the review!

shepherdjerred avatar Jan 04 '26 19:01 shepherdjerred

I hit this same issue on AerynOS, though in my case as I am one of the distro maintainers I was just able to re-enable legacy xtables in the kernel. Beyond just making sure that dagger works with newer kernels it's important to resolve this because running a system with both iptables and nftables rules is technically not a supported configuration and the fact that it hasn't been a major issue is likely more to do with the fact that devices running dagger are usually only running dagger and not mixed workloads.

I'd note that the Kubernetes project went through this several years ago and it would be wise to take a look at how they resolved it:

  • https://github.com/kubernetes/kubernetes/issues/71305
  • https://github.com/kubernetes/kubernetes/pull/82966

ReillyBrogan avatar Jan 05 '26 22:01 ReillyBrogan

I found the latest implementation that k8s uses here

ReillyBrogan avatar Jan 05 '26 22:01 ReillyBrogan

Waiting for this to get merged as I haven't been able to play with dagger for several months on my NixOS machine...

ezynda3 avatar Jan 14 '26 12:01 ezynda3

Ended up needing to revert this before the release, we should revisit it though https://github.com/dagger/dagger/pull/11692

sipsma avatar Jan 14 '26 22:01 sipsma