zos Node is not attempting to wake up its friends

A farmer using the farmerbot reported that their nodes did not wake up automatically after the signal from the bot. Upon inspecting Zos logs, I don't see any evidence that the single online node in the farm was detecting the power target changes and sending WoL packets.

The farm in question is 2405 on mainnet. It's configured with node 4465 always remaining on. The farmer has reported that none of the other nodes are responding to the power target changes.

Here is one example:

At block 12103846 the power target for node 4466 was changed to 'Up'. The timestamp for this block is Fri Apr 19 2024 00:22:06 GMT. No responses to power target changes can be found in the node logs at this time, nor indeed for any other target changes happening for nodes in this farm over the last couple days.

Node 4465 is definitely working though and has active communication with tfchain:

I have asked the farmer to reboot 4465 to see if it helps, but this is of course a fairly serious concern due to the impact on minting if nodes don't respond promptly to power target changes.

Apr 19 '24 21:04 scottyeager

From the logs analysis i saw few interesting things

There is (was) a clock skew on this node for around 30 minutes!
There were also some network interruptions (but not for very long)

As a side effect all rmb messages were invalidated because of the time stamp.

There was a downtime on the 20th (probably has no effect)

I am not sure if any of that related but the time skew is definitely a problem

Apr 23 '24 13:04 muhamadazmy

There should be an error in the logs, but on failure to receive the event it seems we wait 10 seconds before retry but unfortunately we didn't log the failure

We will have to fix that missing log, and wait until this happens again. Obviously the reboot probably fixed the time issue. (note ntpd gives up if the skew is too big)

Side note: I am wondering if we can also have some code to monitor time skew and if it's too big we just restart ntpd. Restarting ntpd forces it to resync even if the skew is huge

Apr 23 '24 13:04 muhamadazmy

failure logs: https://github.com/threefoldtech/zos/issues/2271
clock skew: https://github.com/threefoldtech/zos/issues/2272

Apr 23 '24 13:04 rawdaGastan

Thanks for the investigations here. So far the farmer did not report any further issue since rebooting the node. I'll keep an eye out for any other examples of this behavior.

Apr 23 '24 15:04 scottyeager

No further reports, and clock resync in Zos has been implemented.

Apr 28 '25 22:04 scottyeager

zos zos copied to clipboard

Node is not attempting to wake up its friends

zos
zos copied to clipboard