rtl_433_ESP
rtl_433_ESP copied to clipboard
Exception (LoadProhibited) in run_ook_demods()
Analysis
I am running a vanilla OpenMQTTGateway on a LilyGo RTL 433 Module. Sometime the firmware crashes with the information shown below. I can reproduce the exception when I power on an OregonScientifc Thermo/Hygro Sensor
Most likely a THGN132N
I compiled with:
build_type = debug
monitor_filters = esp32_exception_decoder
and I also added 2 debug printf's to identify the element that may have caused the exception as:
int run_ook_demods(list_t* r_devs, pulse_data_t* pulse_data) {
int p_events = 0;
unsigned next_priority = 0; // next smallest on each loop through decoders
// run all decoders of each priority, stop if an event is produced
int debug_count = 0; // << Added for debugging
logprintfLn(LOG_DEBUG, "\nr_api.c - Decoder start");
for (unsigned priority = 0; !p_events && priority < UINT_MAX; priority = next_priority) {
next_priority = UINT_MAX;
for (void** iter = r_devs->elems; iter && *iter; ++iter) { // <== Trap is here
printf(" %d", debug_count++); // << Added for debugging
r_device* r_dev = *iter;
As the counter indicates, the loop is at either position 87 or 88 (Last last digit of 88 may have been dropped by the exception)
memcpy(&cfg->devices[87], &maverick_et73, sizeof(r_device));
memcpy(&cfg->devices[88], &maverick_et73x, sizeof(r_device));
memcpy(&cfg->devices[89], &mebus433, sizeof(r_device));
Most probably mebus433 ?
[.... ] 85 86 87 8Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x4013996b PS : 0x00060c30 A0 : 0x8010a714 A1 : 0x3ffe31f0
A2 : 0x00000001 A3 : 0x3ffe09cc A4 : 0xbaad5678 A5 : 0x00000000
A6 : 0x00000059 A7 : 0x0000000a A8 : 0x80139969 A9 : 0x3ffe31a0
A10 : 0x00000003 A11 : 0x3ffdeac8 A12 : 0x000003e8 A13 : 0x000000e5
A14 : 0x4009a1e0 A15 : 0x00000000 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0xbaad56a4 LBEG : 0x4008aba5 LEND : 0x4008abb5 LCOUNT : 0xffffffff
Backtrace: 0x40139968:0x3ffe31f0 0x4010a711:0x3ffe3220
#0 0x40139968:0x3ffe31f0 in run_ook_demods at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/rtl_433/r_api.c:523
#1 0x4010a711:0x3ffe3220 in rtl_433_DecoderTask(void*) at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/signalDecoder.cpp:482
I can reproduce the issue almost imediately when activating the THGN132N
Expected Behavior
run_ook_demods() should not cause an exception when scanning the list of protocols
Steps To Reproduce
Activating a THGN132N temparature sensor will cause the exception almost immediatley
Logs
See above
Configuration
Compiling OpenMQTTGateway with
`default_envs = lilygo-rtl_433`
set in platform.ini
Environment
- OS: Lilygo rl433
- Software: OpenMQTTGateway
- Node: N/A
- npm: N/A
Process Supervisor
not applicable
Additional Context
No response
Thanks for identifying this
To confirm which decoder is triggering this, can you add this line back in ( you may need to fiddle with it a bit, but want to see r_dev->name )
logprintfLn(LOG_DEBUG, "demod(%d) - %s", r_dev->modulation, r_dev->name);
It will give the decoder name
I don't think it is mebus433
as it is listed as disabled in the code base.
Once you identify which one, you can add a disabled flag to the device decoder similar to mebus433
and recompile, that will remove it from being accessed.
Is added the printf into lines 523 und 524 as:
for (unsigned priority = 0; !p_events && priority < UINT_MAX; priority = next_priority) {
next_priority = UINT_MAX;
for (void** iter = r_devs->elems; iter && *iter; ++iter) {
r_device* r_dev = *iter;
logprintfLn(LOG_DEBUG, "demod(%d) - %s", r_dev->modulation, r_dev->name);
fflush(stdout);
// Find next smallest priority
Result is shown below:
[ ... ]
rtl_433_ESP(7): demod(9) - Norgo NGE101
rtl_433_ESP(7): demod(4) - Oil Ultrasonic STANDARD ASK
rtl_433_ESP(7): demod(6) - Opus/Imagintronix XT300 Soil Moisture
rtl_433_ESP(7): demod(3) - Oregon Scientific Weather Sensor
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x40139944 PS : 0x00060c30 A0 : 0x8010a714 A1 : 0x3ffe3170
A2 : 0x00000001 A3 : 0x4009a1e0 A4 : 0x00000478 A5 : 0x3ffe0938
A6 : 0x00000000 A7 : 0x0000000a A8 : 0x80139944 A9 : 0x3ffe3120
A10 : 0x00000010 A11 : 0x3ffdea58 A12 : 0x00000071 A13 : 0x00000001
A14 : 0x3ffde5f8 A15 : 0x000001b8 SAR : 0x00000019 EXCCAUSE: 0x0000001c
EXCVADDR: 0x0000047c LBEG : 0x4008aba5 LEND : 0x4008abb5 LCOUNT : 0xffffffff
Backtrace: 0x40139941:0x3ffe3170 0x4010a711:0x3ffe31a0
#0 0x40139941:0x3ffe3170 in run_ook_demods at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/rtl_433/r_api.c:523
#1 0x4010a711:0x3ffe31a0 in rtl_433_DecoderTask(void*) at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/signalDecoder.cpp:482
I have deactivated the corresponding device and the exception has disappeared. But of course the "Oregon Scientific" sensor is no longer recognized. Is there any way to fix this? I would definitely try to help.
Two thumbs up on having the back trace enabled
Looking at the backtrace I thought it would have pointed to the line within Oregon_scientific.c, but it didn't.
As the device decoders are sourced from rtl_433, my approach would be to first refresh that decoder to see if the issue is already resolved ( hopefully no dramatic API changes where made), then do a code review to see if I can spot the issue, then worst case put a lot of debug statements in to see where it is failing. And once I find the issue, resolve it, and send the fix upstream to rtl_433. Not sure how much time you want to put into this.
On further investigation of this problem, I found that the error always seems to occur in malloc() .... However, the stack trace is also always different. The case shown in the following trace occurs most frequently. The packet seems to be decoded properly, then an enqueue() occurs in the callback function;
In src/rtl_433/r_api.c:985
// callback to external function that receives message from device (
// rtl_433_ESPCallBack )
(cfg->callback)(cfg->messageBuffer);
then to enqueueJsonObject in main/ZgatewayRTL_433.ino:261 and jsonQueue.push(bundle); in main/main.ino:436
RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
enqueueJsonObject(RFrtl_433_ESPdata); // <===
storeSignalValue(MQTTvalue);
pubOled((char*)topic.c_str(), RFrtl_433_ESPdata);
----------
JsonBundle bundle;
bundle.doc = jsonDoc;
jsonQueue.push(bundle); // <===
Log.trace(F("Queue length: %d" CR), jsonQueue.size());
and down to components/heap/multi_heap.c:200 where it finally crashes. I don't understand the overall logic of the program so far, but since the decoding has obviously run without errors,
rtl_433_ESP(6): data_output {"model":"Oregon-THGR122N","id":112,"channel":1,"battery_ok":1,"temperature_C":22.7,"humidity":44,"protocol":"Oregon Scientific Weather Sensor","rssi":-41,"duration":186814}
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
I would suspect a storage overlay either in the decoder or another memory problem. With some help I would have time to investigate further.
That section of code is from the openMQTTGateway package, and I believe it is queuing the message for sending via MQTT. I'm wondering if the object is larger than others or for some reason it is exceeding a buffer size. Visually it doesn't appear to be extremely large.
An annoying problem, a simple cause. After increasing the value of rtl_433_ReceiverTask_Stack
in rtl_433_ESP.cpp
by just 48 bytes, the error no longer occurred. To be on the safe side, I set the value to 3000. Decoding is now running smoothly.
@dilmash - If you watch the logs and status messages, the high water mark for that stack is logged so you can tune it further as needed. The log entry is RTL_HWM
alogprintf(LOG_INFO, ", RTL_HWM: %d", uxTaskGetStackHighWaterMark(rtl_433_ReceiverHandle));
I have added your 48 byte tweak to release 0.3.2