plugins icon indicating copy to clipboard operation
plugins copied to clipboard

nftables support for ipmasq and portmap

Open danwinship opened this issue 1 year ago • 7 comments

This adds nftables backends to the two remaining iptables-only bits in containernetworking/plugins; ipmasq (used by ptp and bridge) and portmap. For now, they are both set up to use iptables by default, unless you explicitly request nftables via the network config, or you run the plugin on a host that has nft installed but not iptables (eg, future RHEL 10 hosts).

The firewall plugin currently has "iptables" and "firewalld" backends. In theory it could have a separate explicit "nftables" backend, but I didn't do that here.

The ipmasq and portmap code implement nftables via https://github.com/danwinship/nftables, which is a library I started for the planned kube-proxy nftables backend, and also plan to use for various other nftables ports (eg, cri-o). It might move from danwinship/ to somewhere more generic at some point, but it might not.

After I started working on this branch, I discovered that the bridge plugin was actually already using nftables via pkg/link/spoofcheck.go, using the https://github.com/networkplumbing/go-nft library. Using two separate nftables libraries is probably not great, so we may want to fix that eventually. The libraries have somewhat similar APIs up to a point, then diverge completely because go-nft uses the JSON API to /sbin/nft and therefore has to use the (poorly-documented) JSON rule schema, while danwinship/nftables uses the plain text API to /sbin/nft and so uses "normal" nft rules. eg, compare:

func (sc *SpoofChecker) matchIfaceJumpToChainRule(chain, toChain string) *schema.Rule {
        return &schema.Rule{
                Family: schema.FamilyBridge,
                Table:  natTableName,
                Chain:  chain,
                Expr: []schema.Statement{
                        {Match: &schema.Match{
                                Op:    schema.OperEQ,
                                Left:  schema.Expression{RowData: []byte(`{"meta":{"key":"iifname"}}`)
},
                                Right: schema.Expression{String: &sc.iface},
                        }},
                        {Verdict: schema.Verdict{Jump: &schema.ToTarget{Target: toChain}}},
                },
                Comment: ruleComment(sc.refID),
        }
}

which in danwinship/nftables syntax would become:

func (sc *SpoofChecker) matchIfaceJumpToChainRule(chain, toChain string) *nftables.Rule {
        return &nftables.Rule{
                Chain: chain,
                Rule: nftables.Concat(
                        "iifname", sc.iface,
                        "jump", toChain,
                ),
                Comment: ruleComment(sc.refID),
        }
}

("Draft" PR because not fully-tested yet...)

danwinship avatar Jul 31 '23 14:07 danwinship

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc0003bdb20>: 
      running [/usr/sbin/iptables -t nat -D POSTROUTING -s 10.1.2.2 -j CNI-43a5a67926c1a665ff4c21b7 -m comment --comment name: "testConfig" id: "dummy-0" --wait]: exit status 2: iptables v1.8.7 (nf_tables): Chain 'CNI-43a5a67926c1a665ff4c21b7' does not exist

sigh, this is what https://github.com/coreos/go-iptables/pull/107 was supposed to fix but I guess I fixed it for iptables-legacy but not iptables-nft?

(Was this not failing before? I'm not sure why this change would make the race condition suddenly start happening.)

danwinship avatar Aug 01 '23 15:08 danwinship

linter fail, I see is still in draft, let me know if you want a review,

aojea avatar May 27 '24 15:05 aojea

OK, finally updated and tested:

  • ip.SetupIPMasq() and ip.TeardownIPMasq() are now unchanged (and still always use iptables). Instead, I added a new SetupIPMasqForPlugin and TeardownIPMasqForPlugin that support both iptables and nftables.
  • The bridge and ptp plugins now use iptables by default, or nftables if iptables isn't installed or doesn't work. Or you can force the backend via "ipMasqBackend": "iptables" or "ipMasqBackend": "nftables" in the config.
  • Likewise, portmap now supports nftables, but still uses iptables by default, unless it's not available/usable, or you pass "backend": "nftables", or you set "conditionsV4" or "conditionsV6" to something that looks like nftables syntax rather than iptables.

(I guess this will need a docs update to the cni.dev repo?)

danwinship avatar Jul 24 '24 15:07 danwinship

/assign @squeed

danwinship avatar Jul 24 '24 15:07 danwinship

oh, and I could port this to use the same nftables library that spoofcheck uses, if you preferred.

or port the spoofcheck code to use knftables like kube-proxy (and Calico and soon some other stuff)

danwinship avatar Jul 24 '24 16:07 danwinship

or port the spoofcheck code to use knftables like kube-proxy

https://github.com/danwinship/plugins/commit/17bb5355

danwinship avatar Aug 26 '24 14:08 danwinship

Looks basically good. I'd like to see GC support if possible, since it informs the design the API. Could you add that, even if the top-end API glue isn't there yet.

squeed avatar Aug 27 '24 08:08 squeed