AI-on-the-edge-device Check for corrupted models

The Feature

We sporadically see issues that the system crashes after loading a model, eg. https://github.com/jomjol/AI-on-the-edge-device/discussions/3177#discussioncomment-10350419 Usually this is due a SD card going bad.

I think it would be wise if we would check the models before we use them (and prevent a crash). The best way would be to handle the issue if it is corrupted. Not sure if this is possible with the use tflite library. The other way would be to provide a 2nd file per model containing the CRC32 or MD5 sum. The firmware then could check against it and handle it.

Aug 17 '24 21:08 caco3

It crashes here: this->interpreter = new tflite::MicroInterpreter(this->model, resolver, this->tensor_arena, this->kTensorArenaSize);

https://github.com/jomjol/AI-on-the-edge-device/blob/rolling/code/components/jomjol_tfliteclass/CTfLiteClass.cpp#L208

Sep 01 '24 21:09 caco3

@Slider0007 @SybexX @jomjol Do you have experience with enabling exception handling? IMO this is the only way to catch the crash which is inside the tflite library. How ever I am unable to enable exception handling in the platformio.ini file. What ever I do, I get error: exception handling disabled, use '-fexceptions' to enable but I already replaced -fno-exceptions with -fexceptions...

Sep 02 '24 06:09 caco3

@caco3: I've never used exception handling in ESP IDF environment. Therefore I cannot assist with this topic. As I understand this correct, this could be tricky because every potential exception all over the software needs a catch otherwise processing is getting aborted in error case. Would potentially a lot of work...

Beside execption handling at least a sort of version check could be added. Maybe this helps a bit depending on how the file is getting corrupted. The question is what should be the reaction because the flow in jomjol firmware cannot be aborted gracefully anyway...

https://github.com/Slider0007/AI-on-the-edge-device/blob/7f14d89bc013f6db145eac343d90b4b457ae11b3/code/components/jomjol_tfliteclass/CTfLiteClass.cpp#L325

Nevertehless I raise the question what will the user do if it's not crashing but still not working anymore because the model is corrupted anyway? This does not solve the root cause, wearing SD cards quickly because of tons of reading cycles of the same files...

Sep 02 '24 19:09 Slider0007

this could be tricky because every potential exception all over the software needs a catch otherwise processing is getting aborted in error case. Would potentially a lot of work

Well, without exceptions (as is now), it goes directly into an abort!

The question is what should be the reaction

The crash happens within the tflite library. The right way would be to patch that one, but I fear it might by a lot of learning first. The other way is to add a try/catch around it. This way we can notify the user gracefully without crashing. As of now, I don't know how to do it, so the only thing we can do is add a debug log message just before that call. The current implementation is that on a crash, we stay in DEBUG log level and delay the first round by 5 minutes. This way we will see the log message indication the issue. See the example in https://github.com/jomjol/AI-on-the-edge-device/pull/3220.

Your proposal with the version check sounds as a good start. but it will not be able to catch all corruptions.

Nevertehless I raise the question what will the user do if it's not crashing but still not working anymore because the model is corrupted anyway?

We simply can show an error in the UI/MQTT, ...

This does not solve the root cause, wearing SD cards quickly because of tons of reading cycles of the same files...

Yes, thats right, but I have the feeling we had quite some bug reports because of corrupted models/filesystems. Because of this I investigated and saw that there actually is no validation.

Sep 02 '24 21:09 caco3

https://github.com/jomjol/AI-on-the-edge-device/pull/3220

Dec 03 '24 20:12 caco3

AI-on-the-edge-device AI-on-the-edge-device copied to clipboard

Check for corrupted models

The Feature

AI-on-the-edge-device
AI-on-the-edge-device copied to clipboard