Need a recovery program from a power cycle

Open CamDavidsonPilon opened this issue 4 years ago • 1 comments

A few times, an experiment has been prematurely halted due to a power cycle on a Pi. There are a few situations to consider:

Power cycle on a leader also a worker.
Power cycle on a worker only.
Power cycle on both the leader and worker.

Ideally, a power cycle occurs, and the machine is in the same, or near-same, state as previous to the cycle. This isn't always wanted though, so maybe it's configurable (i.e. I pull the plug to disconnect some bad action, I don't want that action to start again when I turn it back on).

Related to #107

Alternatively, there is an alerting system when the RPi cycles unexpectedly mid-experiment, so the user can manually intervene.

The current position is not good: when the leader goes down, the broker looses all the LWT (whyyyyy), so we need a way to correct this. Solutions:

[x] monitor does a correction: gets latest experiment, looks for jobs with state="ready", and compares against what is currently running on the machine. Differences get a "lost".
[ ] watchdog does some sort of ping to all MQTT-active jobs?

Using a script like the bottom of this page: https://ubuntuforums.org/showthread.php?t=1621039, allows us to know if the shutdown was graceful or not.

Jun 28 '21 14:06 CamDavidsonPilon

monitor does a correction: gets latest experiment, looks for jobs with state="ready", and compares against what is currently running on the machine. Differences get a "lost".

We removed this, but I think we should add this functionality back?

When a leader/worker cycles

Here's an idea that would help users avoid the "UI state doesn't match system state" that is caused by MQTT: don't persist MQTT messages to disk. What are the current topics / messages that need to be persisted - and move that data to another storage.

Not persisting messages means future developers need to find another solution (like local_persistant_storage) which is a good pattern.

Apr 14 '24 20:04 CamDavidsonPilon