WLED icon indicating copy to clipboard operation
WLED copied to clipboard

Resets with IDF 4.4/Arduino Core 2.x platform on network request

Open Aircoookie opened this issue 2 years ago • 42 comments

What happened?

WLED frequently resets upon network requests (e.g. to /json while loading main UI, applying presets...) while (SK6812) LEDs are on.

To Reproduce Bug

Compile WLED with the custom IDF 4.4 + arduino 2.0.2 platform courtesy of Tasmota (this branch). The LITTLEFS library by lorol is replaced by the built-in LittleFS in the 2.x.x arduino core.

The reset reason is the low level RMT interrupts failing during a network request, although I don't yet understand the exact cause. Exception decoder trace below.

This has been reproduced on ESP32 and ESP32-S2 and the platforms:

https://github.com/tasmota/platform-espressif32/releases/download/v2.0.2/platform-tasmota-espressif32-2.0.2.zip
https://github.com/tasmota/platform-espressif32/releases/download/v2.0.2.1/platform-tasmota-espressif32-2.0.2.1.zip
https://github.com/tasmota/platform-espressif32/releases/download/v2.0.2.2/platform-tasmota-espressif32-2.0.2.zip

Expected Behavior

No resets

Install Method

Self-Compiled

What version of WLED?

arduinocore2 branch

Which microcontroller/board are you seeing the problem on?

ESP32

Relevant log/trace output

Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed). 

Core  1 register dump:
PC      : 0x4018be44  PS      : 0x00060035  A0      : 0x40086098  A1      : 0x3ffbf52c  
A2      : 0x3ffbe048  A3      : 0x00000000  A4      : 0x00000080  A5      : 0x00000001
A6      : 0x00000001  A7      : 0x00000000  A8      : 0x80081aff  A9      : 0x00000020  
A10     : 0x3ff56000  A11     : 0x3ffb34f0  A12     : 0xffffffff  A13     : 0x00000020
A14     : 0x3ffbe054  A15     : 0x3ffb3358  SAR     : 0x00000008  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x400813e0  LEND    : 0x400813f4  LCOUNT  : 0x00000000


Backtrace:0x4018be41:0x3ffbf52c |<-CORRUPTED
  #0  0x4018be41:0x3ffbf52c in rmt_ll_enable_tx_err_interrupt at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32/include/hal/rmt_ll.h:253

Anything else?

@Jason2866 I suspect that this issue is due to a programming error on my end surfacing with the newer IDF/Arduino releases or possibly with the exchanged LittleFS library. Still, if you know what could be a likely cause, I'd be super happy to know as I see you did quite a few changes to sdkconfig (which I really like, as it drastically reduces binary size!)

My first guess was that the issue was due to CONFIG_DISABLE_HAL_LOCKS being set in 2.0.2.1, however the issue is also present in https://github.com/tasmota/platform-espressif32/releases/download/v2.0.2/platform-tasmota-espressif32-2.0.2.zip, where that option was unset as far as I can tell.

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

Aircoookie avatar Jan 30 '22 11:01 Aircoookie

@Aircoookie I am not seeing any resets no matter how fast I hit WLED with /json requests on my fork. I have tried as fast as 10 reloads/s while simultaneously running a playlist with 600 LEDs attached and playlist switching presets every few seconds. I did rewrite preset loading logic and optimized JSON buffer usage so that hardly anything serious happen in network callbacks (apart from ArduinoJson handling).

My build environment is:

[env:wemos_d1_mini32]
board = esp32dev
platform = [email protected]
board_build.partitions = tools/WLED_ESP32_4MB_1MB_FS.csv
board_build.filesystem = littlefs
build_unflags = ${common.build_unflags}
build_flags = ${common.build_flags_esp32} -D WLED_RELEASE_NAME=ESP32
  -D WLED_DISABLE_BLYNK
  -D WLED_DISABLE_CRONIXIE
  -D WLED_DISABLE_HUESYNC
  -D WLED_DISABLE_LOXONE
  -D USERMOD_DALLASTEMPERATURE
  -D TEMPERATURE_PIN=23
  -D LEDPIN=16
  -D RLYPIN=19
  -D BTNPIN=17
  -D IRPIN=18
  -D USERMOD_FOUR_LINE_DISPLAY
  -D USE_ALT_DISPlAY
  -D USERMOD_AUTO_SAVE
  ;-D USERMOD_ANIMATED_STAIRCASE
  -D USERMOD_PIRSWITCH
  -D PIR_SENSOR_PIN=26
  -D USERMOD_MULTI_RELAY  # 14, 15, 32
  -D MULTI_RELAY_MAX_RELAYS=3
  -UWLED_USE_MY_CONFIG
lib_deps = ${esp32.lib_deps}
  U8g2@~2.28.8
  paulstoffregen/OneWire@~2.3.6
  Wire

[env:wemos_d1_mini32_debug]
extends = env:wemos_d1_mini32
build_flags = ${common.build_flags_esp32} -D WLED_RELEASE_NAME=ESP32
  -D WLED_DEBUG
  -D LEDPIN=16
  -D RLYPIN=19
  -D BTNPIN=17
  -D IRPIN=18
  -D USERMOD_DALLASTEMPERATURE
  -D TEMPERATURE_PIN=23
  -D USERMOD_FOUR_LINE_DISPLAY
  -D USERMOD_ROTARY_ENCODER_UI
  -D USE_ALT_DISPlAY
  -D USERMOD_AUTO_SAVE
  -D USERMOD_PIRSWITCH
  -D PIR_SENSOR_PIN=36
  -D USERMOD_MULTI_RELAY  # 14, 15, 32
  -D MULTI_RELAY_MAX_RELAYS=3
  -D STATUSLED=-1
  -D WLED_DISABLE_BROWNOUT_DET
  -UWLED_USE_MY_CONFIG

blazoncek avatar Jan 30 '22 12:01 blazoncek

@blazoncek does not happen with [email protected] on my end either. This issue is exclusive to arduino core 2.0.2, or IDF 4.4, or the custom Tasmota library configuration

Aircoookie avatar Jan 30 '22 12:01 Aircoookie

@Aircoookie Looks like a bug in the Arduino core 2.0.x. Core 2.0.x is heavily refactored. I doubt that the problem is related to wled code. You can check if the issues are related to our customization with this setup.

platform = https://github.com/tasmota/platform-espressif32/releases/download/v2.0.2.2/platform-tasmota-espressif32-2.0.2.zip
platform_packages = framework-arduinoespressif32 @ https://github.com/espressif/arduino-esp32.git

Using this setup uses latest orig. Arduino ESP32 from espressif repo

EDIT: Just asked our dev for the "Lights" part in Tasmota. We do not have issues in Tasmota when using RMT

Jason2866 avatar Jan 30 '22 13:01 Jason2866

@Jason2866 Thank you for the info! I've tried that setup and the issue persists, so we can rule out your customizations as a possible cause. I agree that the RMT refactorings seems to be the likely cause, I'll try if I can come up with a MVCE for the core team.

Aircoookie avatar Jan 30 '22 13:01 Aircoookie

I had opened an issue with @makuna a while ago and it turns out the latest RMT code (also) has problems feeding data to RMT buffers. Especially when ISRs take more time than needed. I don't recall issue # but he had it open with ESP32 core regarding that.

blazoncek avatar Jan 30 '22 14:01 blazoncek

FYI I can't reproduce this on Tasmota using RMT and latest Core/idf, however hard I hammer it on the network side. Tested on Atom Matrix (Esp32 Pico D4) with an animation over 25 leds every 50ms. This does not confirm a bug in Arduino or IDF

s-hadinger avatar Jan 30 '22 22:01 s-hadinger

Found the thread... https://github.com/Makuna/NeoPixelBus/issues/524#issuecomment-942557735

blazoncek avatar Jan 31 '22 08:01 blazoncek

@Aircoookie Do a symbol dump on that method "rmt_ll_enable_tx_err_interrupt", if you actually find it listed what is the address? It shouldn't be listed (it is marked inline) but sometimes the compiler will ignore the suggestion of inline. There is a known GNU C++ bug that static inline functions may not honor attributes (inherited or specifically marked) and thus not be placed into IRAM memory location. Thus when run will require it to be loaded from cache (note your error). I ran into this myself and had to do some shenanigans to get the methods marked with IRAM (no longer inline, no longer listed in a header).

Makuna avatar Jan 31 '22 18:01 Makuna

@Makuna thank you for the helpful pointer! Not entirely positive this is what you are referring to, but as expected, nm /.pio/build/esp32dev/libFrameworkArduino.a does not list this function, and does not list any other rmt_ll functions either, however it does list various other RMT functions from file esp32-hal-rmt.c.o: nm_framework.txt

I am also confused about this line in the error: Backtrace:0x4018be41:0x3ffbf52c |<-CORRUPTED, other instances of the error Cache disabled but cached memory region accessed I researched never had a corrupted backtrace.

@s-hadinger thank you for testing! The difference might very well be that WLED uses ESPAsyncWebserver, which in itself operates on interrupts upon network requests while Tasmota uses (as far as I can tell) the synchronous ESP8266Webserver.

Aircoookie avatar Feb 01 '22 15:02 Aircoookie

Perhaps whatever function is inlining rmt_ll_enable_tx_err_interrupt is not placed in IRAM...

Aircoookie avatar Feb 01 '22 15:02 Aircoookie

Not have conclusive evidence, but it seems like this only happens if the network request accesses flash data in any way (filesystem operation, or progmem strings)

Aircoookie avatar Feb 01 '22 17:02 Aircoookie

Also strange that core 1 (handling network interrupts) panics. I would have assumed RMT interrupts to use core 0.

Aircoookie avatar Feb 01 '22 19:02 Aircoookie

I have the issue with 0.13.1 where it frequently crashes and reboots when being controlled via Home Assistant, but I am not sure if it is due to the same reason. It definitely has to do with API calls though.

The Environment string on the info pages says "esp32 V3.3.6-16-gcc5440f6a2".

Is there anything I can do to help debug the issue?

kroimon avatar Apr 24 '22 10:04 kroimon

Latest Tasmota framework (based on 2.0.3 rc1 and IDF 4.4.1) can be used with

platform = https://github.com/tasmota/platform-espressif32/releases/download/v2.0.3rc1/platform-espressif32-2.0.3new.zip

Jason2866 avatar Apr 24 '22 10:04 Jason2866

I have built latest master using the above "v2.0.3rc1" from Tasmota. It produces the same result on my M5Stamp (C3U) as described by @Aircoookie at the top. Constant restarts, seemingly on any request sent from UI.

Kyry11 avatar Apr 26 '22 06:04 Kyry11

I have now also tried a build with the just released 2.0.3 (not rc) from Tasmota. Same result as above, unsurprisingly.

Kyry11 avatar May 06 '22 04:05 Kyry11

Getting some trace output with the latest official (non-RC) 2.0.3 tasmota release (https://github.com/tasmota/platform-espressif32/releases/tag/v.2.0.3) when replicating this issue on esp32-c3 (previously with 2.0.2 I was only getting stack memory dump, mentioned here: https://github.com/Aircoookie/WLED/issues/2596#issuecomment-1120123902)

Though, unsure if the trace output is accurate, based on WIP PR: https://github.com/platformio/platform-espressif32/pull/612

Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.

Core  0 register dump:
MEPC    : 0x42048a02  RA      : 0x42048a56  SP      : 0x3fcad5a0  GP      : 0x3fc8b400  
=> 0x42048a02: String::buffer() const at /Users/trav/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.h:328 (discriminator 2)
      (inlined by) String::c_str() const at /Users/trav/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.h:236 (discriminator 2)
      (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:182 (discriminator 2)
=> 0x42048a56: LinkedList<AsyncWebHeader*, LinkedListNode>::Iterator::operator++() at .pio/libdeps/esp32c3/ESP Async WebServer/src/StringArray.h:53 (discriminator 3)
      (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:181 (discriminator 3)
TP      : 0x3fc89348  T0      : 0x40388812  T1      : 0xffffffe0  T2      : 0xffffffff  
=> 0x40388812: vPortExitCritical at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/riscv/port.c:281
S0/FP   : 0x3fcb151c  S1      : 0x3fcb19d4  A0      : 0x00000001  A1      : 0x6254be41  
A2      : 0x3fcb1870  A3      : 0x00000001  A4      : 0x3fc99954  A5      : 0x600c2000  
A6      : 0x47314252  A7      : 0x34392f76  S2      : 0x3fc9a1c4  S3      : 0x3fcb19e4  
=> 0x47314252: ?? ??:0
S4      : 0x3fcb1a80  S5      : 0x00000001  S6      : 0x0000000a  S7      : 0x00000004  
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
T3      : 0x4f4e6f4d  T4      : 0x45203a79  T5      : 0x654b2d74  T6      : 0x656b636f  
=> 0x4f4e6f4d: ?? ??:0
=> 0x45203a79: ?? ??:0
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000005  MTVAL   : 0x6254be50  
=> 0x40380001: _vector_table at ??:?
MHARTID : 0x00000000  

Stack memory:
3fcad5a0: 0x00000000 0x00000000 0x00000000 0x0404b1be 0x0000000a 0x00000001 0x3fcb1a08 0x00000002
3fcad5c0: 0x3fcb0bf7 0x00000002 0x3fcb19d4 0x42048ad8 0x3fcb0bf7 0x00000002 0x3fcb1a08 0x4204f1be
=> 0x42048ad8: AsyncWebServerRequest::_parseLine() at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:567
      (inlined by) AsyncWebServerRequest::_parseLine() at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:549
=> 0x4204f1be: String::concat(char const*) at /Users/trav/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.cpp:365
3fcad5e0: 0x0000000a 0x00000001 0x3fcb1a08 0x00000002 0x3fcb0bf7 0x00000002 0x3fcb19d4 0x42048bd6
=> 0x42048bd6: AsyncWebServerRequest::_onData(void*, unsigned int) at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:122
3fcad600: 0x3fcb1900 0x3c0d2000 0x3fcb19d4 0x4204899c 0x00000001 0x3fc92e18 0x42046a68 0x42046b8e
=> 0x4204899c: AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*) at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:77
=> 0x42046a68: std::_Function_base::_Base_manager<AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#8}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#8}> const&, std::_Manager_operation) at /Users/trav/.platformio/packages/[email protected]+2021r2-patch3/riscv32-esp-elf/include/c++/8.4.0/bits/std_function.h:196
=> 0x42046b8e: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#8}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /Users/trav/.platformio/packages/[email protected]+2021r2-patch3/riscv32-esp-elf/include/c++/8.4.0/bits/std_function.h:295
3fcad620: 0x00000002 0x3fcb196c 0x3fcb1958 0x00000001 0x00000000 0x3fcb09c4 0x3fcb1900 0x42048da6
=> 0x42048da6: std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*, void*, unsigned int)#7}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at /Users/trav/.platformio/packages/[email protected]+2021r2-patch3/riscv32-esp-elf/include/c++/8.4.0/bits/std_function.h:299
3fcad640: 0x00000000 0x3fcb09c4 0x3fcb1900 0x42040db0 0x3fcb19d4 0x3fcb1900 0x3fcb0a12 0x000001e7
=> 0x42040db0: AsyncClient::_recv(tcp_pcb*, pbuf*, signed char) at .pio/libdeps/esp32c3/AsyncTCP/src/AsyncTCP.cpp:970
3fcad660: 0x00000002 0x00000005 0x3fc99000 0x3fc99664 0x00000001 0x3fcb1900 0x3fcab7a8 0x42040e94
=> 0x42040e94: AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, signed char) at .pio/libdeps/esp32c3/AsyncTCP/src/AsyncTCP.cpp:1248
3fcad680: 0x00000001 0x3fcb1900 0x3fcab7a8 0x42041538 0x00000000 0x3fcab7a8 0x00000000 0x00000000
=> 0x42041538: _async_service_task(void*) at .pio/libdeps/esp32c3/AsyncTCP/src/AsyncTCP.cpp:166
      (inlined by) _async_service_task at .pio/libdeps/esp32c3/AsyncTCP/src/AsyncTCP.cpp:201
3fcad6a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcad6c0: 0x00000000 0x00000000 0x00000000 0x403887ba 0x00000000 0x00000000 0x00000000 0x00000000
=> 0x403887ba: prvTaskExitError at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/riscv/port.c:157
3fcad6e0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xffffff02 0x000000ff 0x27146414 0x01010107
3fcad700: 0x00000001 0x00000000 0x35d1cfdc 0x00000000 0x3f15c000 0x3f800000 0x3fc00000 0x4b000000
=> 0x4b000000: ?? ??:0
3fcad720: 0xcb000000 0x000000ff 0x8800759c 0x7e37e43c 0x7149f2ca 0x00000000 0x00000000 0xfff00000
3fcad740: 0x00000000 0x7ff00000 0x60000000 0x3ff71547 0xf85ddf44 0x3e54ae0b 0x00000000 0x3fd00000
3fcad760: 0x55555555 0x3fd55555 0x652b82fe 0x3ff71547 0x00000000 0x43400000 0x4a454eef 0x3fca7e28
=> 0x43400000: ?? ??:0
=> 0x4a454eef: ?? ??:0
3fcad780: 0x93c9db65 0x3fcd864a 0xa91d4101 0x3fd17460 0x518f264d 0x3fd55555 0xdb6fabff 0x3fdb6db6
3fcad7a0: 0x33333303 0x3fe33333 0xe0000000 0x3feec709 0xdc3a03fd 0x3feec709 0x145b01f5 0xbe3e2fe0
3fcad7c0: 0x652b82fe 0x3c971547 0xc2f8f359 0x01a56e1f 0x00000000 0x3fe62e43 0xfefa39ef 0x3fe62e42
3fcad7e0: 0x0ca86c39 0xbe205c61 0x72bea4d0 0x3e663769 0xc5d26bf1 0x3ebbbd41 0xaf25de2c 0x3f11566a
3fcad800: 0x16bebd93 0x3f66c16c 0x5555553e 0x3fc55555 0xcc000000 0x4c000000 0x3f317180 0x3717f7d1
=> 0x4c000000: ?? ??:0
3fcad820: 0x3eaaaaab 0x3e178897 0x3e3a3325 0x3e924925 0x3f2aaaab 0x3e1cd04f 0x3e638e29 0x3ecccccd
3fcad840: 0x3fb8aa00 0x36eca570 0x3fb8aa3b 0x4b800000 0x3e53f142 0x3e6c3255 0x3e8ba305 0x3edb6db7
=> 0x4b800000: ?? ??:0
3fcad860: 0x3f19999a 0x3f763800 0x3f76384f 0x369dc3a0 0x3338aa3c 0x0da24260 0x3f317200 0x3f317218
3fcad880: 0x35bfbe8c 0x3331bb4c 0x35ddea0e 0x388ab355 0x3b360b61 0x3e2aaaab 0x00000000 0x7ff80000
3fcad8a0: 0x00000000 0x43500000 0x8800759c 0xfe37e43c 0xc2f8f359 0x81a56e1f 0x00000000 0x3c900000
=> 0x43500000: ?? ??:0
3fcad8c0: 0x7fc00000 0xf149f2ca 0x8da24260 0x33000000 0x7f7fffff 0x00000000 0xffc00000 0x41dfffff
=> 0x41dfffff: ?? ??:0
3fcad8e0: 0x94a03595 0x3fdfffff 0x35afe535 0x3fe00000 0x94a03595 0x3fcfffff 0x00000000 0x3ff80000
3fcad900: 0x636f4361 0x3fd287a7 0x8b60c8b3 0x3fc68a28 0x509f79fb 0x3fd34413 0x00000000 0x401c0000
=> 0x401c0000: ?? ??:0
3fcad920: 0x00000000 0x40140000 0x00000010 0x00000000 0xa5a5a5a5 0x00000154 0x3fcad3b0 0x3fc9a1c4
=> 0x40140000: ?? ??:0
3fcad940: 0x3fc95f44 0x3fc95f44 0x3fcad938 0x3fc95f3c 0x00000016 0x3fcab888 0x3fcab888 0x3fcad938
3fcad960: 0x00000000 0x00000003 0x3fcab934 0x6e797361 0x63745f63 0xf97a0070 0x0005d81a 0x00000000
3fcad980: 0x3fcad930 0x00000003 0x00000000 0x3fcb1e00 0x4205b1b8 0x00000000 0x3fc9aac0 0x3fc9ab28
=> 0x4205b1b8: pthread_local_storage_thread_deleted_callback at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/pthread/pthread_local_storage.c:129

fallspectrum avatar May 08 '22 17:05 fallspectrum

Hey! This issue has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. Thank you for using WLED!

stale[bot] avatar Sep 21 '22 05:09 stale[bot]

 (inlined by) String::c_str() const at /Users/trav/.platformio/packages/framework-arduinoespressif32/cores/esp32/WString.h:236 (discriminator 2)
 (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at .pio/libdeps/esp32c3/ESP Async WebServer/src/WebRequest.cpp:182 (discriminator 2)

reminds me of the pending use-after-free bug in Async Webserver : https://github.com/me-no-dev/ESPAsyncWebServer/issues/951 and proposed fix in https://github.com/me-no-dev/ESPAsyncWebServer/pull/952

softhack007 avatar Sep 24 '22 21:09 softhack007

@Aircoookie perhaps you could pull this into your fork?

blazoncek avatar Sep 25 '22 09:09 blazoncek

@softhack007 Is there a easy (without connecting any leds) way to reproduce the resets? I would like to test with a recent builded framework (using a newer idf commit than espressif uses in release arduino core 2.0.5). Many bug fixes in IDF Maybe the actual Tasmota version does work?

platform = https://github.com/tasmota/platform-espressif32/releases/download/v2.0.5.1/platform-espressif32-2.0.5.1.zip

Probably fixes wifi stability issues for esp32 and esp32c3

Jason2866 avatar Sep 25 '22 11:09 Jason2866

Hi @Jason2866, I think the problem in async webserver was found by code analysis and inspection. So it might be hard to reliably reproduce the crash. Sorry I don't have a clear scenario for you, maybe someone else here can help with that.

We have several users reporting such crashes, and the stacktrace provided by @fallspectrum clearly points into the same direction. It seems to happen with idf v4.4.x and arduino-esp32 2.0.x on new MCUs like s2, c3, but also seen on "classic" esp32. Maybe memory management in idf was changed or it is just pure luck that we don't see such crashes with older frameworks.

The root cause seems to be a use-after-free coding error in async webserver, so I see very little chance that it can be solved with updates in other components.


Edit: it could be that we are talking about two different bugs here, as the original report from @Aircoookie was having a crash in a different area:

#0 0x4018be41:0x3ffbf52c in rmt_ll_enable_tx_err_interrupt at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32/include/hal/rmt_ll.h:253

I've seen this one on 'classic ESP32' and on -s3 when using new framework. My solution was to use latest NeoPixelBus (master).

softhack007 avatar Sep 25 '22 11:09 softhack007

There are changes in IDF regarding rmt. The changes are in latest Tasmota core.

Jason2866 avatar Sep 25 '22 11:09 Jason2866

I have reviewed AsyncWebServer's way of handling unneeded headers and although not apparent there is a possibility of deleting a pointer while still using it. I would have rewritten AsyncWebServerRequest::_removeNotInterestingHeaders() to use more explicit iterator for loop. There is something not right within AsyncWebServer as it works very odd on ESP32-S2 as I was testing it. Sometimes it would stall for a second or more other times it would just not return anything to a web request. Especially if LittleFS access was also involved.

I did not debug thoroughly yet to see if error lies in WLED code (callbacks) or AsyncWebServer itself.

blazoncek avatar Sep 25 '22 14:09 blazoncek

@Aircoookie perhaps you could pull this into your fork?

Done. https://github.com/Aircoookie/ESPAsyncWebServer.git @ ~2.0.6 Let's see if it helps!

Aircoookie avatar Sep 25 '22 23:09 Aircoookie

Tested, it does no harm on ESP32 but it also does not help with web server on ESP32-S2. I am beginning to suspect that the problem lies lower in the WiFi stack. Possibly in WiFi itself as I can occasionally see inability to connect to a hardcoded SSID.

blazoncek avatar Sep 26 '22 20:09 blazoncek

We have zero issues with the S2 and wifi. It does connect fine with DHCP and fixed ip addresses. Connection is stable too. If AP is switched off reconnecting does work too. The issue you encounter is coming from "somewhere". We do not use EasyAsyncWebServer. Imho it is the reason.

Jason2866 avatar Sep 27 '22 06:09 Jason2866

Do not get me wrong, my ESP32-S2 (Lolin Wemos S2 mini) always connects to my WiFi after a cold boot, but after warm boot not always. I am also seeing it drop off of network while it is still reacting to button presses and plays WLED effects. In such cases I can also sometimes observe there is no more debug data dumped to serial interface. Since debug is dumped in the same loop() as WLED effects are run or buttons are registered this becomes weirder still. Lolin S2 mini uses CDC USB which is inoperative until firmware boots or fails for some reason.

blazoncek avatar Sep 27 '22 07:09 blazoncek

Yes, and this is related to WLED. Zero issues cold or warmboot or whatever the S2 is running without a hickup with Tasmota. The CDC is working too, no problem.

Jason2866 avatar Sep 27 '22 10:09 Jason2866

Yes, and this is related to WLED.

As you only talk about Tasmota, I think what it means is "related to a scenario we don't have in Tasmota". This could be many things, I guess. Some aspects that we should think about:

  • is Tasmota running ESP32-S2 at full 240Mhz, or is it operated at 80Mhz?
  • is Tasmota using 80Mhz flash speed, or 40Mhz which seems to be the PIO default?
  • is Tasmota using NeoPixelBus? If yes, which version? Of course, it could still be any other functions that WLED uses a lot, but Tasmota will typically not need.

Just a few ideas that might help us dig down to the root cause of crashes.

softhack007 avatar Sep 27 '22 12:09 softhack007