LoRaNow icon indicating copy to clipboard operation
LoRaNow copied to clipboard

ESP32 Gateway crashes on onDio0Rise

Open girtgirt opened this issue 5 years ago • 15 comments

I'm getting messages over LoRa and sending to HTTPS server over WiFi. All works except once in a few hours (quite randomly) board crashes. I tried with ESP32 v 1.0.2 and 1.0.3.

I suspect that issue is with interrupt not having ICACHE_RAM_ATTR as it is needed for ESP32. Simply adding it won't help as per ESP doc: "Not only that, but the entire function tree called from the ISR must also have the ICACHE_RAM_ATTR declared." So all functionality from interrupt should be moved to the main Loop?!?

As Gateway I'm using Heltec ESP32 - Heltec WiFi Lora 32(v2). Top of stack almost always is:

Decoding stack results 0x401624e6: spiGetClockDiv at /Users/aaa/Library/Arduino15/packages/esp32/hardware/esp32/1.0.3-rc1/cores/esp32/esp32-hal-spi.c line 291 0x400d543f: SPIClass::beginTransaction(SPISettings) at /Users/aaa/Library/Arduino15/packages/esp32/hardware/esp32/1.0.3-rc1/libraries/SPI/src/SPI.cpp line 130 0x400d4b30: LoRaClass::singleTransfer(unsigned char, unsigned char) at /Users/aaa/Documents/ArduinoESP32/libraries/LoRaNow/src/utility/LoRa.cpp line 755 0x400d4b64: LoRaClass::readRegister(unsigned char) at /Users/aaa/Documents/ArduinoESP32/libraries/LoRaNow/src/utility/LoRa.cpp line 741 0x400d4ba5: LoRaClass::available() at /Users/aaa/Documents/ArduinoESP32/libraries/LoRaNow/src/utility/LoRa.cpp line 356 0x400d4a02: LoRaNowClass::onReceive(int) at /Users/aaa/Documents/ArduinoESP32/libraries/LoRaNow/src/LoRaNow.cpp line 585 0x400d5282: LoRaClass::handleDio0Rise() at /Users/aaa/Documents/ArduinoESP32/libraries/LoRaNow/src/utility/LoRa.cpp line 724 0x400d52a6: LoRaClass::onDio0Rise() at /Users/aaa/Documents/ArduinoESP32/libraries/LoRaNow/src/utility/LoRa.cpp line 767 0x40080f81: __onPinInterrupt at /Users/aaa/Library/Arduino15/packages/esp32/hardware/esp32/1.0.3-rc1/cores/esp32/esp32-hal-gpio.c line 220 ...

girtgirt avatar Aug 17 '19 15:08 girtgirt

The first version of the loranow have this problem, the esp32 doesn't like the interrupt and crash alot if you put too many code or something.... I probably need to remove the all 2 interrupt and change the state machine to the loop function. I'm working in this...

ricaun avatar Aug 17 '19 23:08 ricaun

Hello, I created this fork to try to fix the problem, https://github.com/ricaun/LoRaNow/tree/update I add ICACHE_RAM_ATTR on the principals interrupt functions and I never get the fatal error anymore. See ya!

ricaun avatar Aug 28 '19 13:08 ricaun

I tried this also, but still, getting an error after a longer run. I even added this attribute to all LoRaNow and LoRa functions, but still getting as they call other I2C functions that don't have it.

girtgirt avatar Aug 28 '19 13:08 girtgirt

I still got the same error =( I move all the decode stuff to the loop, I believe the spi read on the interrupt make the board crashes. https://github.com/ricaun/LoRaNow/tree/update Try this branch and give me a feedback thanks!

ricaun avatar Aug 29 '19 12:08 ricaun

I’ve tested the update branch. Couldn’t do a long running test, but tested for half-hour and it stayed working fine.

Did you guys have the chance to test it?

Is it working on the long term ?

I’ll test a little bit more and come back here with results.

cristian-fernandes avatar Oct 08 '19 00:10 cristian-fernandes

I think I found the issue of this crash. I am using this very nice library to handle 20 Lora sensors with a ESP32 gateway. I used the Loranow examples as starting point and also fixed the correct defines for ISR in IRAM. But once per hour the ESP32 was crashing (with same report as described above). So I loaded a LoRa only receiver program to the ESP32 board and checked the raw data. But from time to time a very big message of more than 128 bytes was coming in (in the Netherlands a telecom firm is also using LoRa). This was the reason for the crash. The solution is simple:

In LoRaNow class add a range check in the write function:

size_t LoRaNowClass::write(uint8_t c) { if(payload_len < sizeof(payload_buf)) { payload_buf[payload_len++] = c; return 1; } else { return 0; }
}

Now the gateway is running without any crash! I even can set the buffer length to say 32 bytes (which is enough for the small loranow messages).

TRudolphi avatar May 05 '21 17:05 TRudolphi

I think I found the issue of this crash. I am using this very nice library to handle 20 Lora sensors with a ESP32 gateway. I used the Loranow examples as starting point and also fixed the correct defines for ISR in IRAM. But once per hour the ESP32 was crashing (with same report as described above). So I loaded a LoRa only receiver program to the ESP32 board and checked the raw data. But from time to time a very big message of more than 128 bytes was coming in (in the Netherlands a telecom firm is also using LoRa). This was the reason for the crash. The solution is simple:

In LoRaNow class add a range check in the write function:

size_t LoRaNowClass::write(uint8_t c) { if(payload_len < sizeof(payload_buf)) { payload_buf[payload_len++] = c; return 1; } else { return 0; } }

Now the gateway is running without any crash! I even can set the buffer length to say 32 bytes (which is enough for the small loranow messages).

@TRudolphi could you please provide fixes that you done for ISR?

khachikyannarek avatar May 06 '21 21:05 khachikyannarek

In LoRaNow.cpp and Lora.cpp:

#if ESP8266 #define ISR_PREFIX ICACHE_RAM_ATTR #else #if ESP32 #define ISR_PREFIX IRAM_ATTR #else #define ISR_PREFIX #endif #endif

But with this fixed, I had still crashed due to too long messages, so the final solution was the range change in the write routine (this fix is in my previous message)

TRudolphi avatar May 07 '21 06:05 TRudolphi

@TRudolphi what version of LoRaNow you use, from the update branch or from Master?

khachikyannarek avatar May 07 '21 12:05 khachikyannarek

I used the update branch.

TRudolphi avatar May 07 '21 15:05 TRudolphi

@TRudolphi one more question, did you updates Lora.cpp?

khachikyannarek avatar May 08 '21 20:05 khachikyannarek

I only changed the IRAM define in Lora.cpp as I mentioned earlier, the rest of the code is working fine.

TRudolphi avatar May 09 '21 18:05 TRudolphi

@TRudolphi could you please help me with one question? I have created an irrigation system automation application using this protocol but my nodes are crashing after random periods: the LoRa module receiver or sender part is stopped working. Can it be related to txPower or not? Any idea?

khachikyannarek avatar Oct 19 '23 12:10 khachikyannarek

I don't think it has something to do with the tx power. What processor do you use? When using an ESP32 / 8266 there can be a problem with handling of the Dio0 interrupt. So for my ESP32 gateway I made a change in the lib and now I poll the status of this line (state-change of this line is not too frequent). After this change it is working for several years now. When using a AVR controller it should also work fine with the interrupt.

TRudolphi avatar Oct 24 '23 10:10 TRudolphi

@TRudolphi thanks for your answer. I also made the same change that you suggested in the top comments. I use Heltec WiFi ESP32 (v2). It seems that the problem is not related to interrupts because in that case after restart ESP32 should start working but in my case I facing an issue with the LoRa module, in some cases LoRa receiver part isn't working(nothing receiving even if the sender is closer) and in other case LoRa sender part

khachikyannarek avatar Oct 24 '23 13:10 khachikyannarek