plugins
plugins copied to clipboard
nftables support for ipmasq and portmap
This adds nftables backends to the two remaining iptables-only bits in containernetworking/plugins; ipmasq (used by ptp
and bridge
) and portmap
. For now, they are both set up to use iptables by default, unless you explicitly request nftables via the network config, or you run the plugin on a host that has nft installed but not iptables (eg, future RHEL 10 hosts).
The firewall
plugin currently has "iptables" and "firewalld" backends. In theory it could have a separate explicit "nftables" backend, but I didn't do that here.
The ipmasq and portmap code implement nftables via https://github.com/danwinship/nftables, which is a library I started for the planned kube-proxy nftables backend, and also plan to use for various other nftables ports (eg, cri-o). It might move from danwinship/
to somewhere more generic at some point, but it might not.
After I started working on this branch, I discovered that the bridge
plugin was actually already using nftables via pkg/link/spoofcheck.go
, using the https://github.com/networkplumbing/go-nft library. Using two separate nftables libraries is probably not great, so we may want to fix that eventually. The libraries have somewhat similar APIs up to a point, then diverge completely because go-nft
uses the JSON API to /sbin/nft
and therefore has to use the (poorly-documented) JSON rule schema, while danwinship/nftables
uses the plain text API to /sbin/nft
and so uses "normal" nft rules. eg, compare:
func (sc *SpoofChecker) matchIfaceJumpToChainRule(chain, toChain string) *schema.Rule {
return &schema.Rule{
Family: schema.FamilyBridge,
Table: natTableName,
Chain: chain,
Expr: []schema.Statement{
{Match: &schema.Match{
Op: schema.OperEQ,
Left: schema.Expression{RowData: []byte(`{"meta":{"key":"iifname"}}`)
},
Right: schema.Expression{String: &sc.iface},
}},
{Verdict: schema.Verdict{Jump: &schema.ToTarget{Target: toChain}}},
},
Comment: ruleComment(sc.refID),
}
}
which in danwinship/nftables
syntax would become:
func (sc *SpoofChecker) matchIfaceJumpToChainRule(chain, toChain string) *nftables.Rule {
return &nftables.Rule{
Chain: chain,
Rule: nftables.Concat(
"iifname", sc.iface,
"jump", toChain,
),
Comment: ruleComment(sc.refID),
}
}
("Draft" PR because not fully-tested yet...)
[FAILED] Unexpected error:
<*errors.errorString | 0xc0003bdb20>:
running [/usr/sbin/iptables -t nat -D POSTROUTING -s 10.1.2.2 -j CNI-43a5a67926c1a665ff4c21b7 -m comment --comment name: "testConfig" id: "dummy-0" --wait]: exit status 2: iptables v1.8.7 (nf_tables): Chain 'CNI-43a5a67926c1a665ff4c21b7' does not exist
sigh, this is what https://github.com/coreos/go-iptables/pull/107 was supposed to fix but I guess I fixed it for iptables-legacy but not iptables-nft?
(Was this not failing before? I'm not sure why this change would make the race condition suddenly start happening.)
linter fail, I see is still in draft, let me know if you want a review,
OK, finally updated and tested:
-
ip.SetupIPMasq()
andip.TeardownIPMasq()
are now unchanged (and still always use iptables). Instead, I added a newSetupIPMasqForPlugin
andTeardownIPMasqForPlugin
that support both iptables and nftables. - The
bridge
andptp
plugins now use iptables by default, or nftables if iptables isn't installed or doesn't work. Or you can force the backend via"ipMasqBackend": "iptables"
or"ipMasqBackend": "nftables"
in the config. - Likewise,
portmap
now supports nftables, but still uses iptables by default, unless it's not available/usable, or you pass"backend": "nftables"
, or you set"conditionsV4"
or"conditionsV6"
to something that looks like nftables syntax rather than iptables.
(I guess this will need a docs update to the cni.dev repo?)
/assign @squeed
oh, and I could port this to use the same nftables library that spoofcheck uses, if you preferred.
or port the spoofcheck code to use knftables like kube-proxy (and Calico and soon some other stuff)
or port the spoofcheck code to use knftables like kube-proxy
https://github.com/danwinship/plugins/commit/17bb5355
Looks basically good. I'd like to see GC support if possible, since it informs the design the API. Could you add that, even if the top-end API glue isn't there yet.