amulet icon indicating copy to clipboard operation
amulet copied to clipboard

sentry.wait is ineffective when clock skew exists between local machine and remote Juju units

Open ryan-beisner opened this issue 8 years ago • 0 comments

Clock skew between local machine and remote Juju units can cause amulet.sentry.wait to take an extraordinarily long time (if the remote machine is skewed to the future).

Inversely, when the remote machine's clock is behind, sentry.wait may return immediately.

Either way, waiting for an IDLE_THRESHOLD which is calculated based on the diff of local machine time vs. remote machine time isn't always what it seems. :timer_clock:

ex:

⟫ date && juju ssh 2 date
Thu Nov  3 21:41:33 UTC 2016
Warning: Permanently added '10.5.4.106' (ECDSA) to the list of known hosts.
Warning: Permanently added '10.5.4.108' (ECDSA) to the list of known hosts.
Thu Nov  3 21:43:13 UTC 2016
Connection to 10.5.4.108 closed.

In the code where (datetime.now() - since).total_seconds() is compared with IDLE_THRESHOLD (30 seconds), the starting total_seconds is a negative number, -90 seconds in this example (which is the future).

That means that Amulet will sit and spin for a full 2 minutes per Juju unit. In deployments with a dozen or two dozen units, this is quite problematic to test resources.

ryan-beisner avatar Nov 03 '16 22:11 ryan-beisner