bugs
bugs copied to clipboard
systemd-timesyncd executes timesyncs when creating and destroying a Docker container
Issue Report
Bug
CoreOS Version
NAME=CoreOS
ID=coreos
VERSION=1192.2.0
VERSION_ID=1192.2.0
BUILD_ID=2016-10-21-0026
PRETTY_NAME="CoreOS 1192.2.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
Environment
AWS EC2, m4.large
Expected Behavior
Don't run systemd-timesync all the time ;)
Actual Behavior
systemd-timesyncs runs +- every 2 seconds, we get about 40k of the following per day on 1 of our etcd masters
December 13th 2016, 15:44:41.379 systemd-timesyncd Synchronized to time server 52.17.30.119:123 (0.coreos.pool.ntp.org). prd-1-master-10.1.21.150
December 13th 2016, 15:44:41.379 systemd-timesyncd Synchronized to time server 52.17.30.119:123 (0.coreos.pool.ntp.org). prd-1-master-10.1.21.150
December 13th 2016, 15:44:41.379 systemd-timesyncd Network configuration changed, trying to establish connection. prd-1-master-10.1.21.150
December 13th 2016, 15:44:41.378 systemd-timesyncd Network configuration changed, trying to establish connection. prd-1-master-10.1.21.150
Reproduction Steps
Run CoreOS 1192.2.0 configured for central services role and look at the logs. It actually only happens on 1 of our 3 central servers. The other have about 4k of the above lines per day. Still too much imho, but not such a big issues as the 40k lines on the other central server.
systemd-timesyncd runs this codepath every time the network links change (it does an inotify on /run/systemd/netif/links/). This seems to happen once a day on my machines. Your network interfaces probably shouldn't be changing that often. Can you provide more details about your environment? My first guess is that DHCP is renewing the lease often.
If your goal is just to reduce the logging, you can adjust the logging level for that service. Those messages are logged at info, so if you raise it to notice, they'll go away.
@crawford thanks for the pointer (especially that systemd-timesyncd runs when the network links change), we'll invest time tomorrow to find the cause of this, though I expect it to be related to containers spawning and dying. Will report back once we know.
We're also considering just switching back to ntpd, would there be a reason not to do so?
I expect it to be related to containers spawning and dying
I just tested that theory and you are correct. I see three instances when I create an interactive container and two more when it's destroyed. Ouch.
I was able to suppress the logs by creating the following drop-in at /etc/systemd/system/systemd-timesyncd.service.d/logging.conf:
[Service]
Environment=SYSTEMD_LOG_LEVEL=notice
@crawford we checked this as well and it's indeed tied to the lifecycle of containers and their veth* devices.
Below a docker run followed by a docker rm -f with systemd-timesyncd log lines in between:
We get 3 timesyncd runs for both the create and destroy event.
Dec 16 10:52:09 core-01 dockerd[1254]: time="2016-12-16T10:52:09.941840764+01:00" level=info msg="POST /v1.21/containers/create"
Dec 16 10:52:09 core-01 dockerd[1254]: time="2016-12-16T10:52:09.972691946+01:00" level=info msg="POST /v1.21/containers/f65a3d083b99ec342c432448546d0524ff56d674937d4c3fd95754b024101a62/start"
Dec 16 10:52:09 core-01 systemd-timesyncd[585]: Network configuration changed, trying to establish connection.
Dec 16 10:52:09 core-01 systemd-timesyncd[585]: Synchronized to time server 109.236.83.221:123 (0.coreos.pool.ntp.org).
Dec 16 10:52:09 core-01 systemd-timesyncd[585]: Network configuration changed, trying to establish connection.
Dec 16 10:52:10 core-01 systemd-timesyncd[585]: Synchronized to time server 109.236.83.221:123 (0.coreos.pool.ntp.org).
Dec 16 10:52:10 core-01 systemd-timesyncd[585]: Network configuration changed, trying to establish connection.
Dec 16 10:52:10 core-01 systemd-timesyncd[585]: Synchronized to time server 109.236.83.221:123 (0.coreos.pool.ntp.org).
Dec 16 10:52:35 core-01 dockerd[1254]: time="2016-12-16T10:52:35.350604263+01:00" level=info msg="DELETE /v1.21/containers/f65a3d083b99?force=1"
Dec 16 10:52:35 core-01 systemd-timesyncd[585]: Network configuration changed, trying to establish connection.
Dec 16 10:52:35 core-01 systemd-timesyncd[585]: Synchronized to time server 109.236.83.221:123 (0.coreos.pool.ntp.org).
Dec 16 10:52:35 core-01 systemd-timesyncd[585]: Network configuration changed, trying to establish connection.
Dec 16 10:52:35 core-01 systemd-timesyncd[585]: Synchronized to time server 109.236.83.221:123 (0.coreos.pool.ntp.org).
Dec 16 10:52:36 core-01 systemd-timesyncd[585]: Network configuration changed, trying to establish connection.
Dec 16 10:52:36 core-01 systemd-timesyncd[585]: Synchronized to time server 109.236.83.221:123 (0.coreos.pool.ntp.org).
Also, while changing the log level gets rid of the log lines the actual NTP syncing still happens, which obviously shouldn't be the case at all, it's unnecessary I/O and you're effectively hammering the NTP servers. See https://bugs.freedesktop.org/show_bug.cgi?id=87505#c7 for a similar issue and a comment from @poettering about this.
I've been trying to find a way disable the timesyncd trigger when a network config changes, but no luck. It seems it's not possible. @crawford I think this needs a fix in CoreOS, but since it's a systemd issue/missing feature it might warrant an upstream systemd bug?
For now we'll switch back to ntpd, maybe CoreOS should do the same?
I'm experiencing this exact issue as well. I only have ~10 containers on my node, which all restarted this morning due to unrelated problem:
$ journalctl --since=2017-01-04 | grep timesync | wc -l
480
CoreOS beta (1235.2.0)
veth* is not managed by networkd, but networkd still creates a status file in /run/systemd/netif/links/.
Facing the same issue here. Going to move back to NTPD as well.
hmm, if this is still relevant, please file a bug on systemd github, explaining the situation briefly. THis looks like something we should fix in timesyncd if this is still an issue. That said, https://bugs.freedesktop.org/show_bug.cgi?id=87505 has long been closed under the assumption this was fixed comprehensively.
@poettering Yes this is still valid.
[root@abf bin]# journalctl --since=2018-01-04 | grep timesync | wc -l 9840
@tpgxyz Thanks for reporting upstream!
https://github.com/systemd/systemd/pull/8611 should fix the issue for systemd 239.
Is it possible to backport patch to systemd 238 in stable?