bluenet icon indicating copy to clipboard operation
bluenet copied to clipboard

Microapp management

Open mrquincle opened this issue 3 years ago • 3 comments

There are a couple of things that have to be done w.r.t. managing microapps.

No microapp on flash

This is also true under the condition of AUTO_ENABLE_MICROAPP_ON_BOOT=1. When there's no microapp on flash, bluenet should not try to run an app.

  • It is possible to read only the protocol version from flash. If it is 0xFF you know there's nothing there. No need to load the app.
  • We can calculate the checksum etc. from flash. If correct, we can run the app.
  • We can even write the "state data variables" if the above was correct. Let's call it AUTO_VALIDATE. To do this is a design choice.

Guarding against microapp bugs

When a microapp is run there's no guard in use that allows bluenet to flag it to be incorrect. We ideally would use such a flag before each new path into the microapp.

What we can do is register the n-th time the microapp code is resumed and we persist this counter in a part of memory that's valid across warm reboots. This we do with a queue, so we persist it over a couple of reboots. If we then see the same counter value for different reboots we know that the reboot is likely to stem from anomalies in the microapp.

This goes beyond just testing if setup and loop return, because that's easier to do. This will even help if there's an error within an interrupt service routine for example. However, no, it won't work if this is is not deterministic. We would then need an identifier rather than only a counter. That means we would probably have to store setup, loop, and callback counters. If something happens in one of these calls, the counter is likely to be still at 1. However, it also catches things like:

void loop() {
    static int evil_developer_var = 0;
    evil_developer_var++;
    if (evil_developer_var == 4) {
        while(1); // never return and crash bluenet only on the 4th execution of loop()
    }
}

Second line of defense

I would also propose a second-line of defense. If there's a reboot we don't want microapps to be disabled by default, but when a couple of reboots happen in quick succession, it is logical to disable them and get the user in the loop. Ideally, after an unexpected reboot I would propose to wait a bit before we enable a microapp. Then:

  • Wait even longer when there is yet another reboot.
  • Persist the timestamp when the microapp is run again.
  • Disable the microapp when the reboot is soon after that moment in time.
  • If there are X reboots within a time period, be conservative and disable the microapps.

In the last case the user has to enable the microapp again, but everything is better than keep trying to run a microapp if it is faulty.

mrquincle avatar Jan 10 '22 17:01 mrquincle

This should be ideally done in the watchdog. However, not sure how much time we have in e.g. nrfx_wdt_event_handler_t (currently not in use).

If we write the coroutine context each time we switch context towards a part of memory that's always accessible, we ideally would retrieve this and store it somewhere before the reset actually happens.

It doesn't seem we can prevent a reset when the watchdog is triggered.

mrquincle avatar Mar 16 '22 21:03 mrquincle

@vliedel disagrees that this has to be done in the watchdog, hence https://github.com/crownstone/bluenet/pull/163 is rejected

mrquincle avatar Jun 15 '22 10:06 mrquincle

Terminology

The function bool MicroappController::watchdogTriggered(uint8_t appIndex) should be named differently. We need a function that describes more general an accidental reboot while being in a microapp context, say bool MicroappController::didRebootWhileRunning(uint8_t appIndex). We actually don't care if the reboot in the end happens by triggering the watchdog.

The _states[index] object should have a field didReboot.

Struct

There should be a struct that is directly written and read from the ipc segment.

/**
 * State to be stored per microapp. A single byte denoting the microapp to be running (just before reboot) seems sufficient for now.
 */
enum class MicroappRuntimeState {
	CS_MICROAPP_NOT_RUNNING = 0,
	CS_MICROAPP_RUNNING = 1,
};

/**
 * State to be stored for all microapps (a small number). 
 */
struct MicroappRuntimeData {
	MicroappRuntimeState appState[MAX_MICROAPPS];
};

Add an IPC_INDEX_MICROAPP_RUNTIME_INFO index and write setRamData and getRamData on that index.

Compare: https://github.com/crownstone/bluenet/pull/163/files/3802ac7ca19c9b2a406f30355ae340dcea140cfe#diff-4461731b4a7aeb585cc962c85265fc65b5429d8c03f579512c6f99aadb16e62b

mrquincle avatar Jul 19 '22 12:07 mrquincle

This has been implemented in https://github.com/crownstone/bluenet/pull/178

mrquincle avatar Sep 05 '22 14:09 mrquincle