ntpd fails to start leaving the node's clock unsynchronized
Description
The latest Flatcar stable release, Flatcar-stable-4459.2.0-hvm, includes ntpd 4.2.8p18, which has a bug, that can cause ntpd to fail to start if it can't create a socket on a link-local address. Because ntpd exits almost immediately with SIGSEGV, it quickly hits the default systemd restart rate limit, after which systemd stops restarting it, leaving the node with an unsynchronized clock. If ntpd is restarted manually later, when all the node's network interfaces are ready, it starts correctly.
Impact
Ntpd is down, node's clock unsynchronized
Environment and steps to reproduce
- Set-up: Running the
Flatcar-stable-4459.2.0-hvmFlatcar release on the AWS EC2 instance - Task: -
- Action(s): -
- Error:
Nov 05 09:39:14 localhost ntpd[1902]: bind(21) AF_INET6 fe80::c5a:e0ff:fec1:350b%2#123 flags 0x11 failed: Cannot assign requested address
Nov 05 09:39:14 localhost ntpd[1902]: unable to create socket on eth0 (5) for fe80::c5a:e0ff:fec1:350b%2#123
Nov 05 09:39:14 localhost ntpd[1902]: failed to init interface for address fe80::c5a:e0ff:fec1:350b%2
Expected behavior
Ntpd should restart with a bigger interval to let the system to initialize its network interfaces and addresses
Additional information
The following systemd ntpd service override file can mitigate the issue as it overrides the default systemd restart rate limit and increases the restart interval:
[Service]
RestartSec=30
[Unit]
StartLimitIntervalSec=0
StartLimitBurst=0
Another workaround while this is being fixed is to mask the ntpd service, that will automatically make the systemd-timesync the default again (as It was in flatcar 1 year ago).
We have been testing it for some days and it works.
But if flatcar is now more on using ntpd, better to get this library fixed.
My 2 cents
Thanks @ealogar @olexanderscherbakov-rf for the issue. FWIW, it helped us to notice that 'ntp' support is currently being discussed for removal on the Gentoo side: https://github.com/flatcar/Flatcar/issues/1958
I'm curious @olexanderscherbakov-rf , by default ntpd is not running and it's timesyncd that takes care of network time protocol. Is there any reason to not use it?
@tormath1 We use Flatcar images in AWS, where ntpd is enabled by default since 3975.2.0