pioreactor icon indicating copy to clipboard operation
pioreactor copied to clipboard

Need a recovery program from a power cycle

Open CamDavidsonPilon opened this issue 4 years ago • 1 comments

A few times, an experiment has been prematurely halted due to a power cycle on a Pi. There are a few situations to consider:

  1. Power cycle on a leader also a worker.
  2. Power cycle on a worker only.
  3. Power cycle on both the leader and worker.

Ideally, a power cycle occurs, and the machine is in the same, or near-same, state as previous to the cycle. This isn't always wanted though, so maybe it's configurable (i.e. I pull the plug to disconnect some bad action, I don't want that action to start again when I turn it back on).

Related to #107


Alternatively, there is an alerting system when the RPi cycles unexpectedly mid-experiment, so the user can manually intervene.


The current position is not good: when the leader goes down, the broker looses all the LWT (whyyyyy), so we need a way to correct this. Solutions:

  • [x] monitor does a correction: gets latest experiment, looks for jobs with state="ready", and compares against what is currently running on the machine. Differences get a "lost".
  • [ ] watchdog does some sort of ping to all MQTT-active jobs?

Using a script like the bottom of this page: https://ubuntuforums.org/showthread.php?t=1621039, allows us to know if the shutdown was graceful or not.

CamDavidsonPilon avatar Jun 28 '21 14:06 CamDavidsonPilon

monitor does a correction: gets latest experiment, looks for jobs with state="ready", and compares against what is currently running on the machine. Differences get a "lost".

We removed this, but I think we should add this functionality back?


When a leader/worker cycles

Here's an idea that would help users avoid the "UI state doesn't match system state" that is caused by MQTT: don't persist MQTT messages to disk. What are the current topics / messages that need to be persisted - and move that data to another storage.

Not persisting messages means future developers need to find another solution (like local_persistant_storage) which is a good pattern.

CamDavidsonPilon avatar Apr 14 '24 20:04 CamDavidsonPilon