esp-idf icon indicating copy to clipboard operation
esp-idf copied to clipboard

[ESP-WIFI-MESH] Guru Meditation Error found when Fixed root was node not available. (IDFGH-13169)

Open andy-danieal opened this issue 1 year ago • 22 comments

Answers checklist.

  • [X] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • [X] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • [X] I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

I encountered an issue that was not resolved after RTC_SW_CPU_RST. The only thing that worked was a hard reset, which meant powering off the device.

ESP-Cache-Issue.txt

I (11013294) mesh: <WIFI_EVENT_SCAN_DONE>status:fail, num:3, id:199
I (11013295) mesh: [IO]enable self-organizing, search parent<adaptive>
Guru Meditation Error: Core  0 panic'ed (Memory protection fault). 
Unknown operation at address 0xffffffff not permitted (INVALID_MEM_TYPE)

Core  0 register dump:
PC      : 0x4003229f  PS      : 0x00060a34  A0      : 0x800320ff  A1      : 0x3fff50a0  
A2      : 0x3ffd5d0c  A3      : 0x3fff58d0  A4      : 0x00060a20  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x3fff58c8  A8      : 0x7ffeb19c  A9      : 0x00000000  
A10     : 0x00000001  A11     : 0xffffffff  A12     : 0x00000000  A13     : 0x00000000  
A14     : 0x00000001  A15     : 0x3ffdffd8  SAR     : 0x0000001d  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x40025314  


Backtrace: 0x4003229c:0x3fff50a0 0x400320fc:0x3fff50c0 0x40025b7e:0x3fff50e0 0x400346a1:0x3fff5100 0x400faf5f:0x3fff5120 0x400e9ef9:0x3fff5140 0x400ce221:0x3fff5170 0x400d480a:0x3fff52e0 0x4002d8b6:0x00000000 |<-CORRUPTED




ELF file SHA256: 8765ae40b9c74068

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0xc (RTC_SW_CPU_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x400254b2
SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe6110,len:0x186c
load:0x4004b000,len:0x4
load:0x4004b004,len:0xb0c
load:0x4004f000,len:0x3268
entry 0x4004b1cc
[0;32mI (26) boot: ESP-IDF v5.2.1-dirty 2nd stage bootloader[0m
[0;32mI (26) boot: compile time Jun 14 2024 11:58:06[0m
[0;32mI (26) boot: chip revision: v1.0[0m
[0;32mI (30) boot.esp32s2: SPI Speed      : 80MHz[0m
[0;32mI (35) boot.esp32s2: SPI Mode       : DIO[0m
[0;32mI (40) boot.esp32s2: SPI Flash Size : 4MB[0m
[0;32mI (44) boot: Enabling RNG early entropy source...[0m
[0;32mI (50) boot: Partition Table:[0m
[0;32mI (53) boot: ## Label            Usage          Type ST Offset   Length[0m
[0;32mI (61) boot:  0 nvs              WiFi data        01 02 0000d000 00004000[0m
[0;32mI (68) boot:  1 otadata          OTA data         01 00 00011000 00002000[0m
[0;32mI (76) boot:  2 phy_init         RF data          01 01 00013000 00001000[0m
[0;32mI (83) boot:  3 edf              WiFi data        01 02 00014000 00004000[0m
[0;32mI (90) boot:  4 storage          Unknown data     01 81 00018000 00020000[0m
[0;32mI (98) boot:  5 ota_0            OTA app          00 10 00040000 00170000[0m
[0;32mI (105) boot:  6 ota_1            OTA app          00 11 001b0000 00170000[0m
[0;32mI (113) boot: End of partition table[0m
[0;32mI (117) esp_image: segment 0: paddr=00040020 vaddr=3f000020 size=4a5dch (304604) map[0m
[0;32mI (187) esp_image: segment 1: paddr=0008a604 vaddr=3ffc9bc0 size=03e34h ( 15924) load[0m
[0;32mI (190) esp_image: segment 2: paddr=0008e440 vaddr=40024000 size=01bd8h (  7128) load[0m
[0;32mI (194) esp_image: segment 3: paddr=00090020 vaddr=40080020 size=e1c78h (924792) map[0m
[0;32mI (385) esp_image: segment 4: paddr=00171ca0 vaddr=40025bd8 size=13fe8h ( 81896) load[0m
[0;32mI (417) boot: Loaded app from partition at offset 0x40000[0m
[0;32mI (417) boot: Disabling RNG early entropy source...[0m
[0;32mI (428) cpu_start: Unicore app[0m
[0;32mI (428) cache: Instruction cache 	: size 8KB, 4Ways, cache line size 32Byte[0m
[0;32mI (431) cache: Data cache 		: size 8KB, 4Ways, cache line size 32Byte[0m
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed). 

Core  0 register dump:
PC      : 0x40039b8d  PS      : 0x00060d34  A0      : 0x80025ea2  A1      : 0x3fffe260  
A2      : 0x02000000  A3      : 0x00000000  A4      : 0x02000000  A5      : 0x00000003  
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000014  A9      : 0x00000001  
A10     : 0x00000028  A11     : 0x00000028  A12     : 0x00000800  A13     : 0x00000003  
A14     : 0x00000000  A15     : 0x00000004  SAR     : 0x00000007  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x00000800  LEND    : 0x00000003  LCOUNT  : 0x40025314  


Backtrace: 0x40039b8a:0x3fffe260 0x40025e9f:0x3fffe270 0x40085de1:0x3fffe290 0x400864a8:0x3fffe2b0 0x40025293:0x3fffe2f0 0x40050439:0x3fffe330 0x40050879:0x3fffe370 0x4004b235:0x3fffe4a0 0x40011371:0x3fffe560 0x4000f788:0x3fffe6f0 0x400072a5:0x3fffe710

andy-danieal avatar Jun 29 '24 09:06 andy-danieal

@andy-danieal Can you provide the .elf file when the issue happen?

zhangyanjiaoesp avatar Jul 01 '24 06:07 zhangyanjiaoesp

I have sent you an email. Please check email and let me know if you have any questions.

andy-danieal avatar Jul 03 '24 08:07 andy-danieal

@andy-danieal I didn't receive any email that include a .elf file.

zhangyanjiaoesp avatar Jul 04 '24 08:07 zhangyanjiaoesp

We sent you an email Ma li at Jul 1, 2024. Please check that email.

Also, We having the same issue with the Root Mesh node. After being active for two days, it stops working. Each 10 seconds, 30 ESP devices send a command and receive a response.

andy-danieal avatar Jul 09 '24 05:07 andy-danieal

I am Ma Li. I didn't receive your email. If you prefer not to upload the elf file to GitHub, you can send it directly to [email protected].

ceerqingtingml avatar Jul 09 '24 06:07 ceerqingtingml

@zhangyanjiaoesp, I have sent on email. please check it.

andy-danieal avatar Jul 09 '24 08:07 andy-danieal

@andy-danieal That's weird. I didn't get your email either. You can upload your elf file here and share the link here.

And you can set the deletion conditions: 图像

zhangyanjiaoesp avatar Jul 09 '24 09:07 zhangyanjiaoesp

@zhangyanjiaoesp,

Download Link

andy-danieal avatar Jul 09 '24 09:07 andy-danieal

@zhangyanjiaoesp,

Download Link

Have download it.

zhangyanjiaoesp avatar Jul 09 '24 09:07 zhangyanjiaoesp

@andy-danieal Is the elf file the elf when the crash occurred? root or child ?

zhangyanjiaoesp avatar Jul 09 '24 11:07 zhangyanjiaoesp

ESP-Cache-Issue.txt

We have already shared Child log where is crash occurred.

andy-danieal avatar Jul 09 '24 11:07 andy-danieal

@andy-danieal Please use this wifi lib to test, thanks. wifi_lib_s2_0710.zip

wifi firmware version: fabad8c

zhangyanjiaoesp avatar Jul 10 '24 09:07 zhangyanjiaoesp

@zhangyanjiaoesp, How do I add an existing project to this wifi lib.? And Can you please share the step for us?

andy-danieal avatar Jul 10 '24 10:07 andy-danieal

@andy-danieal replace the wifi libs in idf/components/esp_wifi/lib/esp32s2

zhangyanjiaoesp avatar Jul 10 '24 11:07 zhangyanjiaoesp

@zhangyanjiaoesp, We have tested the wifi lib, but we are still getting the same issue. and We noticed that the issue exists after the device has been running for 6 to 7 hours.

Also, We have attached the logs we captured using Putty. Unfortunately, we were unable to capture the entire log due to disconnections. We will attempt to recreate the issue and capture the complete log.

andy-danieal avatar Jul 11 '24 09:07 andy-danieal

@zhangyanjiaoesp, We have tested the wifi lib, but we are still getting the same issue. and We noticed that the issue exists after the device has been running for 6 to 7 hours.

Also, We have attached the logs we captured using Putty. Unfortunately, we were unable to capture the entire log due to disconnections. We will attempt to recreate the issue and capture the complete log.

Ok, waiting for your logs, I have added some debug logs in the wifi lib, maybe it can help us find the root cause.

zhangyanjiaoesp avatar Jul 11 '24 09:07 zhangyanjiaoesp

@zhangyanjiaoesp,

We attached a log file. and this issue was found in Root Node. gw-19-7.txt

andy-danieal avatar Jul 19 '24 05:07 andy-danieal

@andy-danieal Are you using the wifi lib I provided? Can you open the wifi information log? I can't get any useful information (including the debug information I added) from the log. And you can enable the following option, thus we can get some back trace information when the crash happen. image

zhangyanjiaoesp avatar Jul 19 '24 06:07 zhangyanjiaoesp

@zhangyanjiaoesp, Thank you for your support. We have added a root node for a long-term test, and unfortunately, the original issue has not reappeared. However, we did encounter another issue.


Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x40032a62  PS      : 0x00060d33  A0      : 0x80032400  A1      : 0x3ffe14e0  
A2      : 0x3fff7ab4  A3      : 0x00000030  A4      : 0x3f03cac2  A5      : 0x00000006  
A6      : 0x00000000  A7      : 0x3ffd5a1c  A8      : 0x00000001  A9      : 0x3ffd5a1c  
A10     : 0x00000002  A11     : 0x01000217  A12     : 0x00000024  A13     : 0x3ffd5a40  
A14     : 0x00000001  A15     : 0x4002c0e4  SAR     : 0x0000001f  EXCCAUSE: 0x0000001d  
EXCVADDR: 0x01000223  LBEG    : 0x00000024  LEND    : 0x3ffd5a40  LCOUNT  : 0x40026b5c  


Backtrace: 0x40032a5f:0x3ffe14e0 0x400323fd:0x3ffe1500 0x40025a05:0x3ffe1520 0x40025a60:0x3ffe1540 0x40025a95:0x3ffe1560 0x40034905:0x3ffe1580 0x400285c1:0x3ffe15a0 0x400e2d67:0x3ffe15c0 0x400e2dac:0x3ffe15e0 0x400e3872:0x3ffe1600 0x400e8259:0x3ffe17c0 0x400ea0c6:0x3ffe17e0 0x400e8467:0x3ffe1840 0x40037fa1:0x3ffe1860 0x400360a4:0x3ffe1880 0x4002e1c6:0x3ffe18b0

As per Core dump core-dump-decode.txt

==================== CURRENT THREAD STACK =====================
#0  remove_free_block (sl=1, fl=2, block=0x3fff7ab4, control=0x3ffd5a1c) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/tlsf/tlsf.c:332
#1  block_locate_free (size=<optimized out>, control=<optimized out>) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/tlsf/tlsf.c:567
#2  tlsf_malloc (tlsf=0x3ffd5a1c, size=<optimized out>) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/tlsf/tlsf.c:1005
#3  0x40032400 in multi_heap_malloc_impl (heap=0x3ffd5a08, size=48) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/multi_heap.c:210
#4  0x40025a08 in heap_caps_malloc_base (size=48, caps=6144) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/heap_caps.c:179
#5  0x40025a63 in heap_caps_malloc (size=48, caps=6144) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/heap_caps.c:202
#6  0x40025a98 in heap_caps_malloc_default (size=48) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/heap/heap_caps.c:228
#7  0x40034908 in malloc (size=48) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/newlib/heap.c:24
#8  0x400285c4 in wifi_malloc (size=48) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/esp_wifi/esp32s2/esp_adapter.c:65
#9  0x400e2d6a in mesh_malloc ()
#10 0x400e2daf in esp_mesh_create_context ()
#11 0x400e3875 in esp_mesh_wifi_recv_cb ()
#12 0x400e825c in hostap_deliver_data ()
#13 0x400ea0c9 in hostap_input ()
#14 0x400e846a in ap_rx_cb ()
#15 0x40037fa4 in ppRxPkt ()
#16 0x400360a7 in ppTask ()
#17 0x4002e1c9 in vPortTaskWrapper (pxCode=0x40035fb4 <ppTask>, pvParameters=0x0) at C:/Espressif/frameworks/esp-idf-v5.2.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

andy-danieal avatar Jul 24 '24 11:07 andy-danieal

@andy-danieal please set the Heap memory debugging like the following and test again, thanks.

image

zhangyanjiaoesp avatar Jul 24 '24 11:07 zhangyanjiaoesp

Thanks for reporting, will close due to short of feedback, feel free to reopen with more updates. Thanks for using our Espressif product!

Sherry616 avatar Sep 05 '24 08:09 Sherry616

@zhangyanjiaoesp, Having tested a fixed root node that hung after a 10-20 day interval, it has worked fine after being powered off. The same issue occurred on a child device when the root node hung; some devices were affected, but not all.

Attached root UART dump and log file. log.txt log-uart-dump.txt

andy-danieal avatar Sep 23 '24 06:09 andy-danieal

Any Update?

andy-danieal avatar Oct 09 '24 05:10 andy-danieal

@andy-danieal The current crash issue appears to be different from the previous one. Are you using the version where the debug logs were added last time?

@andy-danieal Please use this wifi lib to test, thanks. wifi_lib_s2_0710.zip

wifi firmware version: fabad8c

zhangyanjiaoesp avatar Oct 09 '24 07:10 zhangyanjiaoesp

@zhangyanjiaoesp, Both cases threw an exception on a Memory protection fault. After that, the device continuously reset due to a Cached memory region exception. We couldn't recover the device without a power supply, which was a terrible situation for the client side.

We can't update that library on the client side, but we will demonstrate the setup and wait to reproduce the issue.

Also, We need recovery method cached memory region exception without power supply. and root cause of that error?

andy-danieal avatar Oct 10 '24 04:10 andy-danieal

@andy-danieal Although each crash is caused by accessing an illegal address, the disassembly paths differ each time. This is likely due to memory corruption, which makes it difficult to identify the root cause without a reliable reproduction method. The Cached memory region exception Issue arises from the first crash, so we need to pinpoint the root cause of the first crash. Could you provide a stable reproduction method?

zhangyanjiaoesp avatar Oct 10 '24 07:10 zhangyanjiaoesp

Dear @zhangyanjiaoesp,

We have alloted max heap memory after that throw INVALID_MEM_TYPE exception but every time device is going to Cached memory region exception. Why isn’t it going to throw exception and Reboot Device?

Guru Meditation Error: Core  0 panic'ed (Memory protection fault). 
Unknown operation at address 0xffffffff not permitted (INVALID_MEM_TYPE)

Core  0 register dump:
PC      : 0x4003307b  PS      : 0x00060f34  A0      : 0x80032edb  A1      : 0x3ffed270  
--- 0x4003307b: block_set_prev_free at /home/swaroop/esp/esp-idf/components/heap/tlsf/tlsf_block_functions.h:56
 (inlined by) block_mark_as_free at /home/swaroop/esp/esp-idf/components/heap/tlsf/tlsf_block_functions.h:110
 (inlined by) tlsf_free at /home/swaroop/esp/esp-idf/components/heap/tlsf/tlsf.c:1202

A2      : 0x3ffd5f7c  A3      : 0x3fffbc98  A4      : 0x00000000  A5      : 0x3f002900  
A6      : 0x00000000  A7      : 0x3fffbc90  A8      : 0x44065d08  A9      : 0x00000000  
A10     : 0x00000001  A11     : 0xffffffff  A12     : 0x00000000  A13     : 0x00000000  
A14     : 0x00000000  A15     : 0x3ffd6b04  SAR     : 0x00000004  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x40025468  
--- 0x40025468: xt_highint4 at /home/swaroop/esp/esp-idf/components/esp_system/port/soc/esp32s2/highint_hdl.S:52



Backtrace: 0x40033078:0x3ffed270 0x40032ed8:0x3ffed290 0x40025d7a:0x3ffed2b0 0x400979f2:0x3ffed2d0 0x40094ef5:0x3ffed2f0 0x4002e6e2:0x3ffed4a0
--- 0x40033078: block_link_next at /home/swaroop/esp/esp-idf/components/heap/tlsf/tlsf_block_functions.h:102
 (inlined by) block_mark_as_free at /home/swaroop/esp/esp-idf/components/heap/tlsf/tlsf_block_functions.h:109
 (inlined by) tlsf_free at /home/swaroop/esp/esp-idf/components/heap/tlsf/tlsf.c:1202
0x40032ed8: multi_heap_free_impl at /home/swaroop/esp/esp-idf/components/heap/multi_heap.c:235
0x40025d7a: heap_caps_free at /home/swaroop/esp/esp-idf/components/heap/heap_caps_base.c:70
0x400979f2: zh_vector_delete_item at /home/swaroop/Workspace/ESP32/DSPVWH_Repo/DSPVWH_Controller/components/zh_vector/zh_vector.c:201
0x40094ef5: _processing at /home/swaroop/Workspace/ESP32/DSPVWH_Repo/DSPVWH_Controller/components/zh_network/zh_network.c:335
0x4002e6e2: vPortTaskWrapper at /home/swaroop/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:139





ELF file SHA256: 560d896f78ab0575

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0xc (RTC_SW_CPU_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x4002563a
--- 0x4002563a: esp_restart_noos at /home/swaroop/esp/esp-idf/components/esp_system/port/soc/esp32s2/system_internal.c:117

SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe5110,len:0x1950
load:0x4004a000,len:0x4
load:0x4004a004,len:0xb38
load:0x4004e000,len:0x333c
entry 0x4004a1d0
I (26) boot: ESP-IDF v5.3.2-1-g9a6aba7401 2nd stage bootloader
I (26) boot: compile time Dec 15 2024 15:11:43
I (27) boot: chip revision: v1.0
I (31) boot: efuse block revision: v0.2
I (35) boot.esp32s2: SPI Speed      : 80MHz
I (40) boot.esp32s2: SPI Mode       : DIO
I (45) boot.esp32s2: SPI Flash Size : 4MB
I (50) boot: Enabling RNG early entropy source...
I (55) boot: Partition Table:
I (58) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 0000d000 00004000
I (73) boot:  1 otadata          OTA data         01 00 00011000 00002000
I (81) boot:  2 phy_init         RF data          01 01 00013000 00001000
I (88) boot:  3 edf              WiFi data        01 02 00014000 00004000
I (96) boot:  4 storage          Unknown data     01 81 00018000 00020000
I (103) boot:  5 ota_0            OTA app          00 10 00040000 00170000
I (111) boot:  6 ota_1            OTA app          00 11 001b0000 00170000
I (118) boot: End of partition table
I (123) esp_image: segment 0: paddr=00040020 vaddr=3f000020 size=49088h (299144) map
I (190) esp_image: segment 1: paddr=000890b0 vaddr=3ffcb640 size=04244h ( 16964) load
I (194) esp_image: segment 2: paddr=0008d2fc vaddr=40024000 size=02d1ch ( 11548) load
I (198) esp_image: segment 3: paddr=00090020 vaddr=40080020 size=dfa70h (916080) map
I (384) esp_image: segment 4: paddr=0016fa98 vaddr=40026d1c size=14920h ( 84256) load
I (416) boot: Loaded app from partition at offset 0x40000
I (416) boot: Disabling RNG early entropy source...
I (428) cache: Instruction cache 	: size 8KB, 4Ways, cache line size 32Byte
I (428) cache: Data cache 		: size 8KB, 4Ways, cache line size 32Byte
I (434) cpu_start: Unicore app
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed). 

andy-danieal avatar Dec 18 '24 11:12 andy-danieal

@andy-danieal Did you enable the CONFIG_HEAP_POISONING_COMPREHENSIVE option during testing?

zhangyanjiaoesp avatar Dec 20 '24 07:12 zhangyanjiaoesp

@andy-danieal Do you still have this problem?

zhangyanjiaoesp avatar Feb 13 '25 08:02 zhangyanjiaoesp

Thanks for reporting, will close due to inactivity, feel free to reopen if the issue still happens, thanks.

Alvin1Zhang avatar Feb 25 '25 03:02 Alvin1Zhang