colima icon indicating copy to clipboard operation
colima copied to clipboard

Clockdrift in the created VM

Open dirkdev98 opened this issue 3 years ago • 13 comments

Sometimes the time in the VM is lagging behind 'real'. This has happened a few times for me already, but still unable to find a cause for this or even a way to reproduce it. It seems to be after I lock my laptop and get back the next morning, but only once in a while (every 10 days or so) and not consistently every day.

Running a Macbook Pro M1 with the following Colima version & status;

Colima version:

colima version 0.3.4
git commit: 5a4a70481ca8d1e794677f22524e3c1b79a9b4ae

runtime: docker
arch: aarch64
client: v20.10.14
server: v20.10.11

Colima status:

INFO[0000] colima is running                            
INFO[0000] runtime: docker                              
INFO[0000] arch: aarch64 

docker run -it --rm --cap-add SYS_TIME --privileged -e ALLOW_CIDR=0.0.0.0/0 -p 123:123/udp geoffh1977/chrony

Writing New Config File

2022-05-07T08:00:43Z chronyd version 4.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)
2022-05-07T08:00:43Z Could not read valid frequency and skew from driftfile /var/lib/chrony/chrony.drift
2022-05-07T08:00:43Z Initial frequency -100000.000 ppm
2022-05-07T08:00:47Z System's initial offset : 978.140307 seconds slow of true (step)
2022-05-07T08:00:52Z Backward time jump detected!
2022-05-07T08:02:00Z Selected source 162.159.200.1 (pool.ntp.org)
2022-05-07T08:02:00Z System clock wrong by 978.685231 seconds
2022-05-07T08:18:18Z System clock was stepped by 978.685231 seconds
2022-05-07T08:02:57Z Backward time jump detected!
2022-05-07T08:02:57Z Can't synchronise: no selectable sources
2022-05-07T08:05:07Z Selected source 94.198.159.15 (pool.ntp.org)
2022-05-07T08:05:07Z System clock wrong by 977.327132 seconds
2022-05-07T08:21:24Z System clock was stepped by 977.327132 seconds
2022-05-07T08:05:06Z Backward time jump detected!
2022-05-07T08:05:06Z Can't synchronise: no selectable sources

dirkdev98 avatar May 07 '22 08:05 dirkdev98

I observe similar issues with the system time getting out of sync regularly. I assume clock drift could be a cause for the occasional networking issues aswell?

Hint: You can enable the NTP service chrony or openntpd directly in Alpine lima VMs as follows:

colima ssh -- sudo /sbin/setup-ntp -c chrony
colima ssh -- sudo /sbin/setup-ntp -c openntpd

Running this after every colima delete / colima start seems to improve overall stability of the VM.

nifr avatar May 09 '22 09:05 nifr

I observe similar issues with the system time getting out of sync regularly. I assume clock drift could be a cause for the occasional networking issues aswell?

Hint: You can enable the NTP service chrony or openntpd directly in Alpine lima VMs as follows:

colima ssh -- sudo /sbin/setup-ntp -c chrony
colima ssh -- sudo /sbin/setup-ntp -c openntpd

Running this after every colima delete / colima start seems to improve overall stability of the VM.

@nifr thanks for the suggestion, this can be included in Colima directly.

abiosoft avatar May 09 '22 09:05 abiosoft

Thanks @nifr seems to work quiet well.

@abiosoft Was looking through the codebase, not exactly sure where to add it. It looks like the setup is done in https://github.com/abiosoft/colima/blob/c42a6735898fd68747682343861eb74ea683b643/environment/vm/lima/lima.go#L232-L235. Is it okay to add somthing like this before calling a.Exec()

a.Add(func() error {
		return l.host.Run(...)
	})

dirkdev98 avatar May 09 '22 13:05 dirkdev98

Just a note that clock drift, especially when laptop slept, was a classic problem in docker/docker desktop for years and years. I'm sure you know that. I think the classic issue was https://github.com/docker/for-mac/issues/17

rfay avatar May 09 '22 13:05 rfay

@dirkdev98 yeah, you can add it on line 230 and it should be l.Run().

l.host.Run() will execute the command on the host machine while l.Run() will execute it in the Lima VM.

// time sync
a.Add(func() error {
    return l.Run("sudo", "/sbin/setup-ntp", "-c", "chrony")
})
a.Add(func() error {
    return l.Run("sudo", "/sbin/setup-ntp", "-c", "openntpd")
})

abiosoft avatar May 09 '22 14:05 abiosoft

This was supposed to be fixed in Lima with https://github.com/lima-vm/lima/pull/490.

Could you please file an issue against Lima, with the Lima version? While the Lima-internal "fix" has a more coarse resolution, it has the advantages that it should work even when not connected to the internet.

jandubois avatar May 09 '22 16:05 jandubois

Created lima-vm/lima#850.

@abiosoft would you still accept a PR to run setup-ntp anyways?

dirkdev98 avatar May 09 '22 17:05 dirkdev98

@dirkdev98 I would like to wait a bit for the responses on the issue.

As a stopgap you can actually use Lima overrides to run those commands by creating a ~/.lima/_config/override.yaml with the following contents.

provision:
  - mode: system
    script: /sbin/setup-ntp -c chrony
  - mode: system
    script: /sbin/setup-ntp -c openntpd

abiosoft avatar May 09 '22 17:05 abiosoft

Much appreciated!

dirkdev98 avatar May 10 '22 07:05 dirkdev98

@abiosoft Any news regarding this because recently I have a similar issue vmtype vz.

it is less than mentioned in this issue, but still 0.1 to 1 seconds which can be an issue with certain security operations that are time senstive, e.g. checking jwt token not before at, expiration time etc.

I have tried adding the override file, but not sure if it helps that much. Also tried the setup-ntp with busybox etc. There still seems to be a time drift / difference to the host system. The only real solution I think is for lima to fix it to sync correctly with host time. Similar to the comment in the issue mentions a Docker article that explains the issues and their solution https://github.com/lima-vm/lima/issues/850#issuecomment-1121980785 (Addressing Time Drift in Docker Desktop for Mac)

AndreasA avatar May 03 '23 14:05 AndreasA

ok. did some more tests and using

provision:
  - mode: system
    script: /sbin/setup-ntp -c busybox

seems to indeed work best (currently). after a few seconds (after start) the drift is in 0.0x - though not permantently it can still go up to 0.2 but it is by far the lowest I got - second range. which is way better than before. though still not perfect.

AndreasA avatar May 04 '23 07:05 AndreasA

This problem appears to affect me during a make build in a host volume... "Clock skew detected" starts being detected up to 0.97s in the future.

make[5]: warning:  Clock skew detected.  Your build may be incomplete.
make[5]: Warning: File 'CMakeFiles/aom_decoder_app_util.dir/depend.make' has modification time 0.97 s in the future
make[5]: Warning: File 'CMakeFiles/aom_av1_encoder_avx2_intrinsics.dir/depend.make' has modification time 0.94 s in the future

Testing the same build running the container using Docker Desktop is successful with no clock skew warnings.

karlvr avatar May 23 '23 23:05 karlvr

We use testcontainers to spawn a Postgres DB during integration tests of a Java application. We see a positive drift, i.e. do something like INSERT now() INTO tabA and assume a little later that a now() call in the Java JVM would produce a later timestamp. But the timestamp in Postgres is between 100ms and 500ms ahead of the one from the JVM.

mfriedenhagen avatar Nov 15 '23 21:11 mfriedenhagen