bugs icon indicating copy to clipboard operation
bugs copied to clipboard

systemd-timesyncd not as precise as ntpd

Open croemmich opened this issue 9 years ago • 13 comments

I run Deis on CoreOS and recently made the switch to 681, which swapped ntpd for systemd-timesyncd as the default time sync daemon. Deis uses a Ceph to create HA filesystem within the cluster. Ceph has an expectation that times within the cluster are very close, see: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#clock-skews.

Before the switch, Ceph never reported any issues, but after the switch 2/3 of my Ceph monitors were reporting clock skew issues. I verified that systemd-timesyncd was indeed running, but I couldn't find any indication of when/how it was syncing.

Is there a difference in the way systemd-timesyncd works, does it sync less frequently, is it just less accurate, or is there something I need to configure to get the nodes more in sync?

croemmich avatar Jun 18 '15 19:06 croemmich

Well, unlike ntpd timesyncd will only use one upstream server instead of ntpd's 4. You can try configuring timesyncd to use a local ntp server if you aren't already or if using a remote server pick a single one to use for all your nodes, maybe the issue is simply syncing with different reference times. Beyond that I'm not sure. We do still ship ntpd so you can switch back too.

https://coreos.com/docs/cluster-management/setup/configuring-date-and-timezone/

marineam avatar Jun 18 '15 20:06 marineam

Thanks @marineam, I'll give your suggestions a try. I have already switched back to ntpd which solved the issues, but if timesyncd is the future of CoreOS, I'd like to get it working. However, it would be ideal if it functioned the same without additional user configuration.

croemmich avatar Jun 18 '15 20:06 croemmich

Thanks. One advantage of timesyncd is being integrated with networkd so if dhcp provides a local ntp server it will be used. The issue may simply be a bug in timesyncd, it is relatively new code. If it turns out the issue is due to only using a single remote timeserver out of the default pool we may need to revisit the choice of timesyncd.

marineam avatar Jun 18 '15 21:06 marineam

@marineam I'd prefer to not have to deploy a local ntp server. I tried specifying a single remote server for all of the nodes and they still fell out of sync. Regardless, neither option sits well with me, as my current deployment is completely HA and using a single ntp server breaks that.

croemmich avatar Jun 29 '15 20:06 croemmich

Ok, I would recommend switching back to ntpd for now then. I'll dig more to see if timesyncd can be improved or if we need to change the default again. I promise ntpd will remain in the image. :)

https://coreos.com/docs/cluster-management/setup/configuring-date-and-timezone/

marineam avatar Jun 29 '15 20:06 marineam

Cool, thanks!

croemmich avatar Jun 29 '15 21:06 croemmich

(You can check if systemd-timesyncd has synchronized (assuming you are actually using it and not some other time sync server that implements the timedate DBUS interface such as chrony) by looking at the NTP synchronized: line of the timedatectl output)

sitsofe avatar Aug 02 '15 19:08 sitsofe

@croemmich have you seen any improvement in the later versions of CoreOS?

crawford avatar Apr 21 '16 21:04 crawford

Closing due to inactivity.

crawford avatar Sep 20 '16 17:09 crawford

Closing due to inactivity.

If more activity is created, would this be re-opened?

ramayer avatar Sep 18 '17 17:09 ramayer

@ramayer If this is still causing problems, we can reopen. What behavior are you seeing?

bgilbert avatar Sep 18 '17 17:09 bgilbert

@bgilbert I have been running ceph on CoreOS stable for over a year now and this problem has never gone away. I just now found this issue in a web search. I can provide whatever debugging would be helpful.

nealey avatar Feb 28 '19 16:02 nealey

As @croemmich described, systemd-timesyncd does not seem to set clocks correctly.

Like @croemmich I also switched back to ntpd (which made the problem go away) when I found this page with his workaround.

ramayer avatar Mar 01 '19 06:03 ramayer