Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

systemd-networkd: ens192 interface loses a connection along with DHCP address

Open kayrus opened this issue 7 months ago • 3 comments

Description

We experience strange issue in VMware hypervisor. At some point the ens192 interface loses an IP address and networkctl shows that it is degrated and failed. networkctl renew/forcerenew ens192 has no effect, except the networkctl reconfigure ens192, the interface brings up.

There is nothing suspicious in dmesg, but systemd-networkd has the following logs:

Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: Configuring with /usr/lib/systemd/network/zz-default.network.
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: Link UP
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: Gained carrier
Apr 28 06:26:40 localhost systemd-networkd[818]: ens192: DHCPv4 address 10.180.0.182/24, gateway 10.180.0.1 acquired from 10.180.0.3
Apr 28 06:26:42 localhost systemd-networkd[818]: ens192: Gained IPv6LL
Apr 28 06:26:43 localhost systemd-networkd[818]: ens192: DHCPv6 lease lost
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Configuring with /usr/lib/systemd/network/yy-vmware.network.
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Link UP
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Gained carrier
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: DHCPv4 address 10.180.0.182/24, gateway 10.180.0.1 acquired from 10.180.0.2
Apr 28 06:26:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Gained IPv6LL
Apr 28 18:27:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Could not set DHCPv4 address: Connection timed out
Apr 28 18:27:45 kks-pool-4-kxxu4.cloud systemd-networkd[14191]: ens192: Failed
cat /usr/lib/systemd/network/yy-vmware.network
[Match]
Virtualization=vmware
Type=!loopback dummy bridge tunnel vxlan wireguard
Driver=veth

[Network]
DHCP=yes
KeepConfiguration=dhcp-on-stop
IPv6AcceptRA=true

[DHCP]
UseMTU=true
UseDomains=true
RequestBroadcast=true

Impact

the k8s node eventually becomes unavailable

Environment and steps to reproduce

unknown, this happens unexpectedly

Expected behavior

even though the network interface loses the connection, it must recover automatically I found the https://www.freedesktop.org/software/systemd/man/latest/systemd.network.html#RequiredForOnline= networkd unit option, and I guess the RequiredForOnline=always-up should do the trick, but I'm not sure.

Any suggestions are appreciated.

Additional information

dmesg output doesn't have anything suspicious about the ens192 NIC

Flatcar 4152.2.2

cc @jknipper

kayrus avatar Apr 30 '25 15:04 kayrus

The same issue occurred again on another node.

Looks like we're not alone with this issue. See also:

  • https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2054977
  • https://github.com/systemd/systemd/issues/33934
  • https://github.com/systemd/systemd/issues/32045
  • https://gist.github.com/raggi/1f8d0b9f45c5b62e7131b03e6e2ffe68

kayrus avatar May 05 '25 12:05 kayrus

Hello @kayrus did you try to upgrade your Flatcar version to latest Stable (4230.2.0)? This ships a systemd upgrade, I'd be interested to see if you still have the issue.

tormath1 avatar Jun 27 '25 14:06 tormath1

This is a rare case and it's hard to reproduce it. See a more detailed discussion in https://github.com/systemd/systemd/issues/32045

kayrus avatar Jun 27 '25 14:06 kayrus