amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Pod routing policies deleted by systemd
What happened:
Pod outgoing packets are routed via interface eth0 because routing policies for the pod are deleted by systemd-networkd.
Detailed timeline of the relevant events:
- Pod A was scheduled on a node
- IP address associated with eth1 was assigned to pod A
- AWS CNI added a routing policy for pod A
- Pod B scheduled on the same node
- A new ENI attached as eth2
- systemd removes all "foreign" routing policies added by AWS CNI
Attach Logs:
There are 2 secondary ENIs attached, pod with IP 10.7.4.69 are assigned to eth1, but the pod's routing policies are missing.
core@ip-10-1-58-66 ~ $ ip rule list
0: from all lookup local
512: from all to 10.7.4.160 lookup main
512: from all to 10.7.4.83 lookup main
512: from all to 10.7.4.89 lookup main
512: from all to 10.7.4.209 lookup main
512: from all to 10.7.4.213 lookup main
512: from all to 10.7.4.163 lookup main
512: from all to 10.7.4.11 lookup main
1536: from 10.7.4.160 lookup 3
1536: from 10.7.4.83 lookup 3
1536: from 10.7.4.89 lookup 3
1536: from 10.7.4.209 lookup 3
1536: from 10.7.4.213 lookup 3
1536: from 10.7.4.163 lookup 3
1536: from 10.7.4.11 lookup 3
32766: from all lookup main
32767: from all lookup default
From aws cni plugins.log, the policies were added.
{"level":"info","ts":"2021-09-01T06:53:52.861Z","caller":"driver/driver.go:178","msg":"Added toContainer rule for 10.7.4.69/32"}
{"level":"info","ts":"2021-09-01T06:53:52.861Z","caller":"driver/driver.go:178","msg":"Added rule priority 1536 from 10.7.4.69/32 table 2"}
From systemd-networkd log, the routing policies were deleted after eth2 was added.
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Link 15 added
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: udev initialized link
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: State changed: pending -> initialized
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_315 interface=org.freedesktop.DBus.Properties m
ember=PropertiesChanged cookie=73 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Saved original MTU: 1500
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Link state is up-to-date
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: found matching network '/etc/systemd/network/01-eth.network'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/eth2/disable_ipv6' to '0'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv4/ip_forward' to '1'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/all/forwarding' to '1'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/eth2/use_tempaddr' to '0'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/eth2/accept_ra' to '0'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting nomaster
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting address genmode for link
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Failed to read sysctl property stable_secret: Input/output error
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting nomaster done.
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting address genmode done.
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Removing routing policy rule: priority: 1536, 10.7.4.69/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Removing routing policy rule: priority: 1536, 10.7.4.58/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Removing routing policy rule: priority: 1536, 10.7.4.166/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
How to reproduce it (as minimally and precisely as possible):
Schedule pods on a worker node with systemd v247.4 or newer, until two secondary ENIs are attached. Run ip rule list and check if routing policies for pods associated to the first secondary ENI are deleted.
Anything else we need to know?:
- systemd v248 introduced a change that removes foreign routing policies when reconfiguring interfaces. The change was backported to systemd v247 (First released in v247.4).
- Possibly related issue https://github.com/aws/amazon-vpc-cni-k8s/issues/1514
Mitigation:
A network configure as below can instruct systemd to not delete the routing policies added by AWS CNI.
[Match]
Name=eth*
[Network]
KeepConfiguration=yes
Environment:
- systemd: 274.6
- Kubernetes version (use
kubectl version): 1.20.8 - CNI Version 1.7.9
- OS (e.g:
cat /etc/os-release): Flatcar 2905.2.2 - Kernel (e.g.
uname -a): 5.10.59.
Hi @hligit
Thanks for the details. Regarding the mitigation I feel it would be better to update our documentation.
I found systemd introduced a new configuration ManageForeighRoutingPolicyRules which would be the proper fix. According to the maintainer, this new configuration will be backported to v247 and v248. https://github.com/systemd/systemd/pull/19287#issuecomment-910955617
Nice thanks for letting us know @hligit. I will update the docs accordingly.
I also encounter the same issue after upgrade FlatCar CoreOS recently, below are my two customized systemd-networkd configurations to workaround it (Only verified and worked in FlatCar CoreOS, not Fedora CoreOS)
-
This configuration is for the eni* and eth* network interface, adding below 2 files to avoid the systemd-networkd to manage the ip rule or route which is added by AWS VPN CNI plugin
/etc/systemd/network/10-aws-cni-eni.network
[Match] Name=eni* [Link] Unmanaged=yes/etc/systemd/network/10-aws-cni-ethn.network
[Match] Name=eth* [Network] KeepConfiguration=yes -
This configuration is for the eth0 network interface, adding this for the CoreOS reboot issue 345 and the firewall mark ip rule doesn't show up problem
/etc/systemd/network/10-aws-cni-eth0.network
[Match] Name=eth0 [Network] DHCP=ipv4 [DHCP] RouteMetric=512 [RoutingPolicyRule] FirewallMark=0x80/0x80 Priority=1024
If FlatCar official upgrade systemd including the ManageForeighRoutingPolicyRules feature, I will post the new systemd-networkd configuration
Thanks @smalltown! Your eni* link configuration is quite nice that it works with current version of systemd. We have below configuration on Flatcar with a patched systemd.
[Network]
ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no
I don't think that a global ManageForeignRout... entry is the right answer because systemd-networkd can still interfere with the manually configured network, depending on whether the default network configuration tries to use DHCP, DHCPv6 or configures another option that prevents proper connectivity.
Speaking for the Flatcar Container Linux team I urge you to generate a networkd unit file under /run/systemd/network/ on the host where you set the network interface in question to Unmanaged=yes - only this is a safe and reliable solution. Currently on Flatcar we ship rules for Calico, Cilium and so on because things happened there, too. But with a generic name like eth1 we can't ship a rule on the image, so please try to generate the networkd unit file which you can do from a privileged Pod either through entering the host mount namespace with nsenter or by bind-mounting the folder into the container.
Thanks @pothos for chiming in! I tested below network unit file on Flatcar v2905.2.3 and it doesn't seem to work.
core@ip-10-1-58-125 ~ $ cat /etc/systemd/network/10-awscni.network
[Match]
Name=eni*
[Link]
Unmanaged=yes
systemd-networkd debug log shows the policies are still removed.
core@ip-10-1-58-125 ~ $ journalctl -u systemd-networkd | grep -E '(10.7.4.124|enid57ead4665a)'
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: New device has no master, continuing without
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Flags change: +MULTICAST +BROADCAST
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Link 16 added
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: link pending udev initialization...
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Saved original MTU: 9001
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Flags change: +UP +LOWER_UP +RUNNING
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Link UP
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Gained carrier
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Remembering route: dst: 10.7.4.124/32, src: n/a, gw: n/a, prefsrc: n/a, scope: link, table: main, proto: boot, type: unicast
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Remembering foreign routing policy rule: priority: 512, 0.0.0.0/0 -> 10.7.4.124/32, iif: n/a, oif: n/a, table: 254
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Remembering foreign routing policy rule: priority: 1536, 10.7.4.124/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: udev initialized link
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: State changed: pending -> initialized
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Link state is up-to-date
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: found matching network '/etc/systemd/network/10-awscni.network'
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: State changed: initialized -> unmanaged
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Removing routing policy rule: priority: 512, 0.0.0.0/0 -> 10.7.4.124/32, iif: n/a, oif: n/a, table: 254
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Removing routing policy rule: priority: 1536, 10.7.4.124/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Forgetting routing policy rule: priority: 512, 0.0.0.0/0 -> 10.7.4.124/32, iif: n/a, oif: n/a, table: 254
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Forgetting routing policy rule: priority: 1536, 10.7.4.124/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
While the route exists,
core@ip-10-1-58-125 ~ $ ip route | grep enid57ead4665a
10.7.4.124 dev enid57ead4665a scope link
core@ip-10-1-58-125 ~ $ ip link show enid57ead4665a
16: enid57ead4665a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether 3e:30:1b:16:25:74 brd ff:ff:ff:ff:ff:ff link-netns cni-23d2c957-1af1-3133-5aeb-6153e4b7093e
Ok, funny, so I guess we either need Unmanaged plus the additional global setting (not sure if it's a good idea to set it automatically or if the distro or the user would be in charge), or we could try to generate a valid network unit file that configures the routes and policies and turns off everything that is not needed (DHCP=no, LinkLocalAddressing=no, RequiredForOnline=no, Scope=link etc).
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
On Flatcar the global systemd settings are now set by default to work around the problem. The interface should also be set Unmanaged now by default by I didn't double-check it for this CNI.
FYI; here almost the same case in another CNI, with some instructions on how your CNI could generate a networkd unit and maintain it during the runtime to prevent requiring the user to set up the global systemd settings: https://github.com/cilium/cilium/issues/18706#issuecomment-1066986342
After testing the FlatCar CoreOS version 3033.2.3, I found amazon vpc cni can exclude this issue, the default configuration of ManageForeignRoutes and ManageForeignRoutingPolicyRules works right now
But I found the iptables command of version 3033.2.0 uses the nftables kernel backend instead of the iptables backend, that leads amazon vpc cni broken again, the workaround could refer to issue #1847
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Issue closed due to inactivity.