[RFE] support chrony or support NTPD as default instead of sntpd for AWS ami's
Current situation
Flat car Ami released for AWS cloud by default use SNTP as the time server instead of chrony or NTP which resolve upto multiple ms accuracy.
We checked 2 instance & noticed offset of about < 250 ms we did not notice any use SNTP config, atleast based on the OS config. The problem we noticed was with the interface in the path having resolution until Seconds but not in ms with SNTP
Flat car OS uses Systemd-timesyncd & i’m unable find any flag or config which can remove the offset to ms accuracy btw nodes. We could not find any way to set the time Precision with SNTP but in any case, the OS must resolve time to ms accuracy by default
$ timedatectl show-timesync --all
LinkNTPServers=
SystemNTPServers=
RuntimeNTPServers=
FallbackNTPServers=0.flatcar.pool.ntp.org 1.flatcar.pool.ntp.org 2.flatcar.pool.ntp.org 3.flatcar.pool.ntp.org
ServerName=0.flatcar.pool.ntp.org
ServerAddress=167.172.70.21
RootDistanceMaxUSec=5s
PollIntervalMinUSec=32s
PollIntervalMaxUSec=34min 8s
PollIntervalUSec=4min 16s
NTPMessage={ Leap=0, Version=4, Mode=4, Stratum=2, Precision=-23, RootDelay=1.296ms, RootDispersion=47.546ms, Reference=6D31CFAE, OriginateTimestamp=Thu 2024-02-01 11:23:41 UTC, ReceiveTimestamp=Thu 2024-02-01 11:23:41 UTC, TransmitTimestamp=Thu 2024-02-01 11:23:41 UTC, DestinationTimestamp=Thu 2024-02-01 11:23:41 UTC, Ignored=no, PacketCount=3, Jitter=20.034ms }
Frequency=-12022283
Impact
Machine time offset varies btw < 250 ms
Ideal future situation
Support chrony or enable NTPD by default in AWS ami to resolve the accuracy issue
Additional information
Addition github issues reported & references
hi @shankar-vng - this seems weird.
how did you determine that the instance clocks are off by 250ms?
have you checked if the situation is better when using ntpd? if so, please consider opening an issue with https://github.com/systemd/systemd because that may be an upstream issue.
can you paste timedatectl timesync-status from both instances?
We had a similar topic with Azure where we documented how to use chrony through docker: https://www.flatcar.org/docs/latest/installing/cloud/azure/#use-the-azure-hyper-v-host-for-time-synchronisation-instead-of-ntp
@jepio Thank for your response. Reply in-line
- how did you determine that the instance clocks are off by 250ms?
Our container logs running on different machine had timestamp difference of south of or < 200m (not always 200ms). The offset varies based on resolution & DNS. Here is the requested status.
Machine 1
timedatectl timesync-status
Server: 167.71.195.165 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
Leap: normal
Version: 4
Stratum: 3
Reference: 907EF2B0
Precision: 1us (-24)
Root distance: 26.923ms (max: 5s)
Offset: -732us
Delay: 1.468ms
Jitter: 1.238ms
Packet count: 243
Frequency: +6.202ppm
______________________________________
Machine 2
$ timedatectl timesync-status
Server: 47.241.41.246 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
Leap: normal
Version: 4
Stratum: 2
Reference: 64643D58
Precision: 1us (-24)
Root distance: 63.178ms (max: 5s)
Offset: +1.901ms
Delay: 2.661ms
Jitter: 1.852ms
Packet count: 243
Frequency: +7.312ppm
_______________________________________
Machine 3
Server: 172.104.44.120 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
Leap: normal
Version: 4
Stratum: 2
Reference: 768F1153
Precision: 1us (-25)
Root distance: 38.428ms (max: 5s)
Offset: +528us
Delay: 1.391ms
Jitter: 609us
Packet count: 243
Frequency: +7.618ppm
_______________________________________
Machine 4
$ timedatectl timesync-status
Server: 106.10.186.200 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
Leap: normal
Version: 4
Stratum: 2
Reference: 6A0A9885
Precision: 1us (-25)
Root distance: 221us (max: 5s)
Offset: -627us
Delay: 1.884ms
Jitter: 1.540ms
Packet count: 243
Frequency: +20.541ppm
I understand that this is a systemD issue but iwhen it comes to ami's for cloud, then it is a wise option to use some of the cloud provider defaults used in ami's
I see the issue now: systemd-timesyncd only syncs with a single ntp server, and it implements SNTP not NTP. From man systemd-timesyncd:
The systemd-timesyncd service implements SNTP only. This
minimalistic service will step the system clock for large offsets
or slowly adjust it for smaller deltas. Complex use cases that
require full NTP support (and where SNTP is not sufficient) are
not covered by systemd-timesyncd.
@pothos how about we rethink the default configuration to use? We might even want to add chrony to azure OEM for ptp and switch on AWS sync to the local NTP/PTP source https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#ptp-hardware-clock-requirements.
@jepio please let me know if I can help in anyway in pushing the changes to Aws Ami's before end of Q1. Kindly point me to the relevant documentation 🙏
It's a matter of figuring out how to implement the change in the AWS OEM sysext without disrupting other platforms. To start you would need to build your own images for testing: https://www.flatcar.org/docs/latest/reference/developer-guides/sdk-modifying-flatcar/.
I can't promise that anyone will have time to look at this in Q1, we're all busy with other issues.
We merged https://github.com/flatcar/scripts/pull/1792 which implements this change for GCP/AWS/Azure. This will be released in the alpha channel in april.
@shankar-vng this reached stable just now (3975.2.0).