async-mqtt-client
async-mqtt-client copied to clipboard
Various crashes on publish
Hello. I have been experienced crashes on publish
. My sketch publishes on three topics every second, and the crashes always happen on the second or third publish
invocation for each round. Here are my decoded stack traces:
Third invocation
0x4008d5a4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 155
0x4008d7d5: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 170
0x4015964f: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x40159696: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x401592f1: __cxxabiv1::__cxa_allocate_exception(std::size_t) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_alloc.cc line 268
0x40159074: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x400eb6bf: AsyncMqttClient::publish(char const*, unsigned char, bool, char const*, unsigned int, bool, unsigned short) at C:\Users\Rosa\Documents\Arduino\libraries\async-mqtt-client\src\AsyncMqttClient.cpp line 737
Second invocation
0x4008d5a4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 155
0x4008d7d5: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 170
0x4015964f: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x40159696: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x401592f1: __cxxabiv1::__cxa_allocate_exception(std::size_t) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_alloc.cc line 268
0x40159074: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x400ed9ed: AsyncMqttClientInternals::PublishOutPacket::PublishOutPacket(char const*, unsigned char, bool, char const*, unsigned int) at c:\users\rosa\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-80-g6c4433a-5.2.0\xtensa-esp32-elf\include\c++\5.2.0\ext/new_allocator.h line 104
0x400eb6d1: AsyncMqttClient::publish(char const*, unsigned char, bool, char const*, unsigned int, bool, unsigned short) at C:\Users\Rosa\Documents\Arduino\libraries\async-mqtt-client\src\AsyncMqttClient.cpp line 737
0x4008d5a4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 155
0x4008d7d5: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 170
0x4015964f: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x40159696: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x401592f1: __cxxabiv1::__cxa_allocate_exception(std::size_t) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_alloc.cc line 268
0x40159074: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x400eb6bf: AsyncMqttClient::publish(char const*, unsigned char, bool, char const*, unsigned int, bool, unsigned short) at C:\Users\Rosa\Documents\Arduino\libraries\async-mqtt-client\src\AsyncMqttClient.cpp line 737
If there is any other info that I could post to help diagnose the issue, please tell me and I'll try to provide it.
I also get a Guru Meditation Error: Core 0 panic'ed (StoreProhibited)
, but the stack trace points to AsyncTCP as a the source - should I post the issue here or there?
Thanks in advance.
Is it reproducable with generic code? Are you using the master or develop branch?
I can look into the issue next week.
I'm using the develop branch, latest commit.
To be honest, I haven't tried to reproduce it with generic code - I will try to put together a MWE.
Sorry for the delay, I have a MWE now: https://pastebin.com/ug5Hs8mS. Here are the logs: https://drive.google.com/file/d/16gM9VgLNqQPIPxVR_fOmROubKuvtcotx/view?usp=sharing
The logfile is pretty big, so here's the interesting part: https://pastebin.com/sFHesgC5
This example also reproduced another issue I had been having with reconnection to the broker, so I'll be opening a new issue for that.
The crash reason is not in the log... the "Guru meditation error..." contains why the code crashed. It could be the watchdog, a dangling pointer... Is there a way to get the error code?
https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html
The Guru Meditation error I get does show up in the full log. I don't have access to that file at the moment but I'll paste it here as soon as I can.
It happens at a different point in time (IIRC it does happen at a publish
invocation but not inmediately after or before the specific error instances I pasted in the first post), so I was not sure it was related to this issue.
Most obvious reason is an invalid pointer dereference. Mqtt messages are stored on the heap and cleaned up at regular intervals. Maybe (despite I tested a lot) there is an issue here.
Could also be the watchdog but this is more a esp8266 thing.
Here's the Guru Meditation Error I get in the MWE:
loop: wifi connected
loop: mqtt connected
PUBLISH
new back #3
PUBLISH
new back #3
PUBLISH
new back #3
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x400d70c7 PS : 0x00060530 A0 : 0x8011c01d A1 : 0x3ffb3e90
A2 : 0x3ffc117c A3 : 0x3ffd00c0 A4 : 0x00000005 A5 : 0x00000020
A6 : 0x00000020 A7 : 0x00000000 A8 : 0x00000004 A9 : 0x3ffb3e70
A10 : 0x00000000 A11 : 0x40085750 A12 : 0x3ffb99e8 A13 : 0x00000000
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x0000001d
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Backtrace: 0x400d70c7:0x3ffb3e90 0x4011c01a:0x3ffb3ec0 0x4011c4e6:0x3ffb3ee0 0x4011d00f:0x3ffb3f00 0x4011cef1:0x3ffb3f20 0x4011d0ac:0x3ffb3f40 0x401173b0:0x3ffb3f60 0x400893c5:0x3ffb3f90
Rebooting...
ets Jun 8 2016 00:22:57
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
Decoded backtrace is:
PC: 0x400d70c7: _tcp_poll(void*, tcp_pcb*) at C:\Users\Rosa\Documents\Arduino\libraries\AsyncTCP-master\src\AsyncTCP.cpp line 261
EXCVADDR: 0x00000000
Decoding stack results
0x400d70c7: _tcp_poll(void*, tcp_pcb*) at C:\Users\Rosa\Documents\Arduino\libraries\AsyncTCP-master\src\AsyncTCP.cpp line 261
0x4011c01a: tcp_slowtmr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c line 1213
0x4011c4e6: tcp_tmr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c line 164
0x4011d00f: tcpip_tcp_timer at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/timeouts.c line 142
0x4011cef1: sys_check_timeouts at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/timeouts.c line 342
0x4011d0ac: sys_timeouts_mbox_fetch at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/timeouts.c line 435
0x401173b0: tcpip_thread at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c line 111
0x400893c5: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143
Exactly like the Guru Meditation Error I got on my original code, it points to AsyncTCP; should I repost the issue there?
According to the documentation you linked, and since EXCVADDR
is zero, it does indeed look like a NULL dereference.
I'm on holiday now so I can't look into this in detail. I obviously also cant test it myself at this moment.
It is odd that the stack trace does not contain any functions from this codebase. Unless meanwhile solved, I'll look into this in two weeks.
I'm on holiday now so I can't look into this in detail. I obviously also cant test it myself at this moment.
Whenever you can and want. Have a nice holiday!
It is odd that the stack trace does not contain any functions from this codebase.
Yes, I thought so myself.
Just to touch on this, I have also narrowed down crashes on my system to also be related to calling publish.
There is a similarity too in that new(unsigned int) is lurking within the exception decode, but the stack trace is NOT in the async MQTT library, instead the async httpsserver, so maybe it is an AsyncTCP issue. It very much does smell of a corrupted heap somewhere, but the common aspect does appear to be the actions of publish that may be causing that....
Hi, I also had a lot of issues with this but when I plotted the free heap, it was getting empty over 10min time or so. So I figured the heap wasn't cleaned up in time. I added a delay between publishes of 100ms (could be less but did not test it) or so and problem seems to be solved.
same here, I put all publishes in a queue (size 300 messages) and use a delay of 100ms. Works fine.
May I find this a but odd. The mqtt lib already puts the messages in a queue so there should not be adifference.
If it really is a heap (fragmentation?) issue, you could try to raise the minimum free heap requirement in this lib. There is a #define somewhere.
https://github.com/marvinroger/async-mqtt-client/blob/bf341978ad506a78a3d56c560ba34c1d08bb4794/src/AsyncMqttClient.hpp#L9