systemd-networkd: ens192 interface loses a connection along with DHCP address
Description
We experience strange issue in VMware hypervisor. At some point the ens192 interface loses an IP address and networkctl shows that it is degrated and failed. networkctl renew/forcerenew ens192 has no effect, except the networkctl reconfigure ens192, the interface brings up.
There is nothing suspicious in dmesg, but systemd-networkd has the following logs:
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: Configuring with /usr/lib/systemd/network/zz-default.network.
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: Link UP
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: Gained carrier
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: DHCPv4 address 10.180.0.182/24, gateway 10.180.0.1 acquired from 10.180.0.3
Apr 28 06:26:42 localhost systemd-networkd[818]: ens192: Gained IPv6LL
Apr 28 06:26:43 localhost systemd-networkd[818]: ens192: DHCPv6 lease lost
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Configuring with /usr/lib/systemd/network/yy-vmware.network.
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Link UP
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Gained carrier
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: DHCPv4 address 10.180.0.182/24, gateway 10.180.0.1 acquired from 10.180.0.2
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Gained IPv6LL
Apr 28 18:27:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Could not set DHCPv4 address: Connection timed out
Apr 28 18:27:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Failed
cat /usr/lib/systemd/network/yy-vmware.network
[Match]
Virtualization=vmware
Type=!loopback dummy bridge tunnel vxlan wireguard
Driver=veth
[Network]
DHCP=yes
KeepConfiguration=dhcp-on-stop
IPv6AcceptRA=true
[DHCP]
UseMTU=true
UseDomains=true
RequestBroadcast=true
Impact
the k8s node eventually becomes unavailable
Environment and steps to reproduce
unknown, this happens unexpectedly
Expected behavior
even though the network interface loses the connection, it must recover automatically
I found the https://www.freedesktop.org/software/systemd/man/latest/systemd.network.html#RequiredForOnline= networkd unit option, and I guess the RequiredForOnline=always-up should do the trick, but I'm not sure.
Any suggestions are appreciated.
Additional information
dmesg output doesn't have anything suspicious about the ens192 NIC
Flatcar 4152.2.2
cc @jknipper
The same issue occurred again on another node.
Looks like we're not alone with this issue. See also:
- https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2054977
- https://github.com/systemd/systemd/issues/33934
- https://github.com/systemd/systemd/issues/32045
- https://gist.github.com/raggi/1f8d0b9f45c5b62e7131b03e6e2ffe68
Hello @kayrus did you try to upgrade your Flatcar version to latest Stable (4230.2.0)? This ships a systemd upgrade, I'd be interested to see if you still have the issue.
This is a rare case and it's hard to reproduce it. See a more detailed discussion in https://github.com/systemd/systemd/issues/32045