colima
colima copied to clipboard
Clockdrift in the created VM
Sometimes the time in the VM is lagging behind 'real'. This has happened a few times for me already, but still unable to find a cause for this or even a way to reproduce it. It seems to be after I lock my laptop and get back the next morning, but only once in a while (every 10 days or so) and not consistently every day.
Running a Macbook Pro M1 with the following Colima version & status;
Colima version:
colima version 0.3.4
git commit: 5a4a70481ca8d1e794677f22524e3c1b79a9b4ae
runtime: docker
arch: aarch64
client: v20.10.14
server: v20.10.11
Colima status:
INFO[0000] colima is running
INFO[0000] runtime: docker
INFO[0000] arch: aarch64
docker run -it --rm --cap-add SYS_TIME --privileged -e ALLOW_CIDR=0.0.0.0/0 -p 123:123/udp geoffh1977/chrony
Writing New Config File
2022-05-07T08:00:43Z chronyd version 4.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)
2022-05-07T08:00:43Z Could not read valid frequency and skew from driftfile /var/lib/chrony/chrony.drift
2022-05-07T08:00:43Z Initial frequency -100000.000 ppm
2022-05-07T08:00:47Z System's initial offset : 978.140307 seconds slow of true (step)
2022-05-07T08:00:52Z Backward time jump detected!
2022-05-07T08:02:00Z Selected source 162.159.200.1 (pool.ntp.org)
2022-05-07T08:02:00Z System clock wrong by 978.685231 seconds
2022-05-07T08:18:18Z System clock was stepped by 978.685231 seconds
2022-05-07T08:02:57Z Backward time jump detected!
2022-05-07T08:02:57Z Can't synchronise: no selectable sources
2022-05-07T08:05:07Z Selected source 94.198.159.15 (pool.ntp.org)
2022-05-07T08:05:07Z System clock wrong by 977.327132 seconds
2022-05-07T08:21:24Z System clock was stepped by 977.327132 seconds
2022-05-07T08:05:06Z Backward time jump detected!
2022-05-07T08:05:06Z Can't synchronise: no selectable sources
I observe similar issues with the system time getting out of sync regularly. I assume clock drift could be a cause for the occasional networking issues aswell?
Hint: You can enable the NTP service chrony or openntpd directly in Alpine lima VMs as follows:
colima ssh -- sudo /sbin/setup-ntp -c chrony
colima ssh -- sudo /sbin/setup-ntp -c openntpd
Running this after every colima delete / colima start seems to improve overall stability of the VM.
I observe similar issues with the system time getting out of sync regularly. I assume clock drift could be a cause for the occasional networking issues aswell?
Hint: You can enable the NTP service
chronyoropenntpddirectly in AlpinelimaVMs as follows:colima ssh -- sudo /sbin/setup-ntp -c chrony colima ssh -- sudo /sbin/setup-ntp -c openntpdRunning this after every
colima delete/colima startseems to improve overall stability of the VM.
@nifr thanks for the suggestion, this can be included in Colima directly.
Thanks @nifr seems to work quiet well.
@abiosoft Was looking through the codebase, not exactly sure where to add it. It looks like the setup is done in https://github.com/abiosoft/colima/blob/c42a6735898fd68747682343861eb74ea683b643/environment/vm/lima/lima.go#L232-L235. Is it okay to add somthing like this before calling a.Exec()
a.Add(func() error {
return l.host.Run(...)
})
Just a note that clock drift, especially when laptop slept, was a classic problem in docker/docker desktop for years and years. I'm sure you know that. I think the classic issue was https://github.com/docker/for-mac/issues/17
@dirkdev98 yeah, you can add it on line 230 and it should be l.Run().
l.host.Run() will execute the command on the host machine while l.Run() will execute it in the Lima VM.
// time sync
a.Add(func() error {
return l.Run("sudo", "/sbin/setup-ntp", "-c", "chrony")
})
a.Add(func() error {
return l.Run("sudo", "/sbin/setup-ntp", "-c", "openntpd")
})
This was supposed to be fixed in Lima with https://github.com/lima-vm/lima/pull/490.
Could you please file an issue against Lima, with the Lima version? While the Lima-internal "fix" has a more coarse resolution, it has the advantages that it should work even when not connected to the internet.
Created lima-vm/lima#850.
@abiosoft would you still accept a PR to run setup-ntp anyways?
@dirkdev98 I would like to wait a bit for the responses on the issue.
As a stopgap you can actually use Lima overrides to run those commands by creating a ~/.lima/_config/override.yaml with the following contents.
provision:
- mode: system
script: /sbin/setup-ntp -c chrony
- mode: system
script: /sbin/setup-ntp -c openntpd
Much appreciated!
@abiosoft Any news regarding this because recently I have a similar issue vmtype vz.
it is less than mentioned in this issue, but still 0.1 to 1 seconds which can be an issue with certain security operations that are time senstive, e.g. checking jwt token not before at, expiration time etc.
I have tried adding the override file, but not sure if it helps that much. Also tried the setup-ntp with busybox etc. There still seems to be a time drift / difference to the host system. The only real solution I think is for lima to fix it to sync correctly with host time. Similar to the comment in the issue mentions a Docker article that explains the issues and their solution https://github.com/lima-vm/lima/issues/850#issuecomment-1121980785 (Addressing Time Drift in Docker Desktop for Mac)
ok. did some more tests and using
provision:
- mode: system
script: /sbin/setup-ntp -c busybox
seems to indeed work best (currently). after a few seconds (after start) the drift is in 0.0x - though not permantently it can still go up to 0.2 but it is by far the lowest I got - second range. which is way better than before. though still not perfect.
This problem appears to affect me during a make build in a host volume... "Clock skew detected" starts being detected up to 0.97s in the future.
make[5]: warning: Clock skew detected. Your build may be incomplete.
make[5]: Warning: File 'CMakeFiles/aom_decoder_app_util.dir/depend.make' has modification time 0.97 s in the future
make[5]: Warning: File 'CMakeFiles/aom_av1_encoder_avx2_intrinsics.dir/depend.make' has modification time 0.94 s in the future
Testing the same build running the container using Docker Desktop is successful with no clock skew warnings.
We use testcontainers to spawn a Postgres DB during integration tests of a Java application. We see a positive drift, i.e. do something like INSERT now() INTO tabA and assume a little later that a now() call in the Java JVM would produce a later timestamp. But the timestamp in Postgres is between 100ms and 500ms ahead of the one from the JVM.