nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

esp32: fix a crash with PSRAM + SMP

Open yamt opened this issue 1 year ago • 18 comments

Summary

this function is called via esp_spiram_init_cache early in the boot.

Impact

Testing

esp32-devkitc:psram + smp enabled

yamt avatar Aug 28 '24 12:08 yamt

Thanks for submitting it, @yamt !

I will test it on our internal CI. Please don't merge it yet.

tmedicci avatar Aug 28 '24 13:08 tmedicci

Hi @yamt

If I set CONFIG_DEBUG_ASSERTIONS and CONFIG_DEBUG_FEATURES it fails to boot. I guess it's somehow debug-asserting. Can you take a look at it?

The defconfig I used for it:

#
# This file is autogenerated: PLEASE DO NOT EDIT IT.
#
# You can use "make menuconfig" to make any modifications to the installed .config file.
# You can then do "make savedefconfig" to generate a new defconfig file that includes your
# modifications.
#
# CONFIG_ARCH_LEDS is not set
# CONFIG_NSH_ARGCAT is not set
# CONFIG_NSH_CMDOPT_HEXDUMP is not set
CONFIG_ARCH="xtensa"
CONFIG_ARCH_BOARD="esp32-devkitc"
CONFIG_ARCH_BOARD_COMMON=y
CONFIG_ARCH_BOARD_ESP32_DEVKITC=y
CONFIG_ARCH_CHIP="esp32"
CONFIG_ARCH_CHIP_ESP32=y
CONFIG_ARCH_CHIP_ESP32WROVER=y
CONFIG_ARCH_INTERRUPTSTACK=2048
CONFIG_ARCH_STACKDUMP=y
CONFIG_ARCH_XTENSA=y
CONFIG_BOARD_LOOPSPERMSEC=16717
CONFIG_BUILTIN=y
CONFIG_DEBUG_ASSERTIONS=y
CONFIG_DEBUG_ASSERTIONS_EXPRESSION=y
CONFIG_DEBUG_FEATURES=y
CONFIG_DEBUG_FULLOPT=y
CONFIG_DEBUG_SYMBOLS=y
CONFIG_DEV_ZERO=y
CONFIG_ESP32_IMM_HEAP=y
CONFIG_ESP32_SPIRAM=y
CONFIG_ESP32_UART0=y
CONFIG_FS_PROCFS=y
CONFIG_HAVE_CXX=y
CONFIG_HAVE_CXXINITIALIZE=y
CONFIG_HEAP2_BASE=0x3f800000
CONFIG_HEAP2_SIZE=4194304
CONFIG_IDLETHREAD_STACKSIZE=3072
CONFIG_INIT_ENTRYPOINT="nsh_main"
CONFIG_INIT_STACKSIZE=3072
CONFIG_INTELHEX_BINARY=y
CONFIG_IOB_NBUFFERS=36
CONFIG_IOB_NCHAINS=36
CONFIG_IOB_THROTTLE=8
CONFIG_MM_IOB=y
CONFIG_MM_REGIONS=4
CONFIG_NDEBUG=y
CONFIG_NSH_ARCHINIT=y
CONFIG_NSH_BUILTIN_APPS=y
CONFIG_NSH_FILEIOSIZE=512
CONFIG_NSH_LINELEN=64
CONFIG_NSH_READLINE=y
CONFIG_PREALLOC_TIMERS=4
CONFIG_RAM_SIZE=114688
CONFIG_RAM_START=0x20000000
CONFIG_RR_INTERVAL=200
CONFIG_SCHED_HPWORK=y
CONFIG_SCHED_WAITPID=y
CONFIG_SMP=y
CONFIG_SMP_NCPUS=2
CONFIG_START_DAY=6
CONFIG_START_MONTH=12
CONFIG_START_YEAR=2011
CONFIG_SYSLOG_BUFFER=y
CONFIG_SYSTEM_NSH=y
CONFIG_TESTING_MM=y
CONFIG_UART0_SERIAL_CONSOLE=y

tmedicci avatar Aug 28 '24 15:08 tmedicci

Can you also enable SMP for the psram defconfig`, please?

why? how about psram_usrheap?

yamt avatar Aug 29 '24 06:08 yamt

Can you also enable SMP for the psram defconfig`, please?

why? how about psram_usrheap?

It's up to you ;) I would enable it too if it works smoothly...

tmedicci avatar Aug 29 '24 12:08 tmedicci

Hi @yamt

If I set CONFIG_DEBUG_ASSERTIONS and CONFIG_DEBUG_FEATURES it fails to boot. I guess it's somehow debug-asserting. Can you take a look at it?

The defconfig I used for it:

there seems to be a few issues.

  • you should bump CONFIG_MM_REGIONS.
  • https://github.com/apache/nuttx/pull/13233
  • for some reason, the last page is corrupted.

the following is with CONFIG_MM_FILL_ALLOCATIONS=y. thus i expect it's all 0xcc. but as you can see there are some 32byte patterns. (32byte because of EVENODD mode i guess) with NORMAL mode, the symptom disappears. but i guess NORMAL mode doen't meet the general coherency expectations for nuttx smp. do you have any ideas?

nsh> xd 3fbf7000 256
Hex dump:
0000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0050: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0060: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0070: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0080: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0090: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00a0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00b0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00c0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00d0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00e0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00f0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
nsh> 
nsh> xd 3fbf8000 256
Hex dump:
0000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0020: 55 55 04 15 55 55 45 15 55 55 55 15 55 54 14 44 UU..UUE.UUU.UT.D
0030: 55 55 45 15 55 50 14 51 05 54 45 55 55 45 51 55 UUE.UP.Q.TEUUEQU
0040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0050: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0060: 15 55 55 55 55 55 45 55 45 55 05 45 55 54 55 50 .UUUUUEUEU.EUTUP
0070: 45 55 54 55 55 55 55 15 55 40 55 55 55 04 45 55 [email protected]
0080: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
0090: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00a0: 54 54 55 04 54 51 55 55 54 55 11 45 45 05 00 55 TTU.TQUUTU.EE..U
00b0: 50 54 45 55 45 55 51 04 14 45 55 01 55 41 55 41 PTEUEUQ..EU.UAUA
00c0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00d0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
00e0: 45 15 54 54 45 55 51 15 55 55 55 11 45 51 55 55 E.TTEUQ.UUU.EQUU
00f0: 54 55 55 11 55 51 55 55 55 45 55 55 54 44 50 54 TUU.UQUUUEUUTDPT
nsh> 

yamt avatar Aug 29 '24 12:08 yamt

Hi @yamt ,

there seems to be a few issues.

  • you should bump CONFIG_MM_REGIONS.

Yes, this is the cause for failing the boot.

Although this change makes sense, I didn't find any issues related direct to it.

  • for some reason, the last page is corrupted.

the following is with CONFIG_MM_FILL_ALLOCATIONS=y. thus i expect it's all 0xcc. but as you can see there are some 32byte patterns. (32byte because of EVENODD mode i guess) with NORMAL mode, the symptom disappears. but i guess NORMAL mode doen't meet the general coherency expectations for nuttx smp. do you have any ideas?

I have no idea. After increasing the CONFIG_MM_REGIONS, it was able to boot, but free command fails:

[CPU0] _assert: Assertion failed info.uordblks + info.fordblks == info.arena: at file: mm_heap/mm_mallinfo.c:157

In fact, if we disable CONFIG_DEBUG_ASSERTIONS, free returns:

nsh> free
                 total       used       free    maxused    maxfree  nused  nfree
esp32-imem:      98300       8628      89672       9024      89672      5      1
      Umem:    4402076    4263516    4398464       4008    4194288     27      5

It isn't evaluating the used/free size correctly.

tmedicci avatar Aug 29 '24 13:08 tmedicci

Although this change makes sense, I didn't find any issues related direct to it.

it prevents DIAGASSERT on CONFIG_MM_REGIONS from showing the panic messages.

  • for some reason, the last page is corrupted.

the following is with CONFIG_MM_FILL_ALLOCATIONS=y. thus i expect it's all 0xcc. but as you can see there are some 32byte patterns. (32byte because of EVENODD mode i guess) with NORMAL mode, the symptom disappears. but i guess NORMAL mode doen't meet the general coherency expectations for nuttx smp. do you have any ideas?

I have no idea. After increasing the CONFIG_MM_REGIONS, it was able to boot, but free command fails:

[CPU0] _assert: Assertion failed info.uordblks + info.fordblks == info.arena: at file: mm_heap/mm_mallinfo.c:157

In fact, if we disable CONFIG_DEBUG_ASSERTIONS, free returns:

nsh> free
                 total       used       free    maxused    maxfree  nused  nfree
esp32-imem:      98300       8628      89672       9024      89672      5      1
      Umem:    4402076    4263516    4398464       4008    4194288     27      5

It isn't evaluating the used/free size correctly.

it's because the last page corruption.

yamt avatar Aug 29 '24 14:08 yamt

it's because the last page corruption.

I see... I will create an issue to keep track of this, but we have no schedule for this issue. If you find the root cause, we can help testing it on our CI too. For now, I don't think the PSRAM support is fully functional for ESP32 with SMP enabled.

tmedicci avatar Aug 29 '24 14:08 tmedicci

it's because the last page corruption.

I see... I will create an issue to keep track of this, but we have no schedule for this issue. If you find the root cause, we can help testing it on our CI too. For now, I don't think the PSRAM support is fully functional for ESP32 with SMP enabled.

does it work for esp-idf well?

yamt avatar Aug 29 '24 14:08 yamt

it's because the last page corruption.

I see... I will create an issue to keep track of this, but we have no schedule for this issue. If you find the root cause, we can help testing it on our CI too. For now, I don't think the PSRAM support is fully functional for ESP32 with SMP enabled.

does it work for esp-idf well?

Yes. We have it supported if I'm not mistaken.

tmedicci avatar Aug 29 '24 15:08 tmedicci

it's because the last page corruption.

I see... I will create an issue to keep track of this, but we have no schedule for this issue. If you find the root cause, we can help testing it on our CI too. For now, I don't think the PSRAM support is fully functional for ESP32 with SMP enabled.

does it work for esp-idf well?

Yes. We have it supported if I'm not mistaken.

Ok. So I guess something wrong in nuttx.

Btw, as you know, nuttx esp32 code and esp-idf have a lot of similarlity. Do you know what's the situation about the copyright here? I'm wondering what I can copy and what I can't.

yamt avatar Aug 29 '24 16:08 yamt

Btw, as you know, nuttx esp32 code and esp-idf have a lot of similarlity. Do you know what's the situation about the copyright here? I'm wondering what I can copy and what I can't.

IDF is licensed under Apache 2.0 License, so there is no problem "copying" code to NuttX. Please note, however, the third party repositories. You can check more details here: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/COPYRIGHT.html

tmedicci avatar Aug 29 '24 19:08 tmedicci

Btw, as you know, nuttx esp32 code and esp-idf have a lot of similarlity. Do you know what's the situation about the copyright here? I'm wondering what I can copy and what I can't.

IDF is licensed under Apache 2.0 License, so there is no problem "copying" code to NuttX. Please note, however, the third party repositories. You can check more details here: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/COPYRIGHT.html

don't we need to inherit SPDX-FileCopyrightText: 2018-2024 Espressif Systems (Shanghai) CO LTD?

yamt avatar Aug 30 '24 03:08 yamt

it's because the last page corruption.

I see... I will create an issue to keep track of this, but we have no schedule for this issue. If you find the root cause, we can help testing it on our CI too. For now, I don't think the PSRAM support is fully functional for ESP32 with SMP enabled.

i guess https://github.com/apache/nuttx/pull/13249 is the root cause.

yamt avatar Aug 30 '24 08:08 yamt

Btw, as you know, nuttx esp32 code and esp-idf have a lot of similarlity. Do you know what's the situation about the copyright here? I'm wondering what I can copy and what I can't.

IDF is licensed under Apache 2.0 License, so there is no problem "copying" code to NuttX. Please note, however, the third party repositories. You can check more details here: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/COPYRIGHT.html

don't we need to inherit SPDX-FileCopyrightText: 2018-2024 Espressif Systems (Shanghai) CO LTD?

As far as I understand, it's enough to add a notice in the NOTICE file (according to https://www.apache.org/licenses/LICENSE-2.0.html). @jerpelea , can you help us here?

tmedicci avatar Aug 30 '24 11:08 tmedicci

it's because the last page corruption.

I see... I will create an issue to keep track of this, but we have no schedule for this issue. If you find the root cause, we can help testing it on our CI too. For now, I don't think the PSRAM support is fully functional for ESP32 with SMP enabled.

i guess #13249 is the root cause.

right, I will test asap...

tmedicci avatar Aug 30 '24 11:08 tmedicci

Hi @yamt

CI tests ran successfully. Let's wait for https://github.com/apache/nuttx/pull/13249 to be merged before merging this.

I tested i with the following configs (for psram and psram_usrheap). Can you please add them to this PR?

Usually, whenever we check that some defconfig is working as expected with SMP enabled, we let it be enabled in order to provide the best experience for the users. So, if you can enable them in this PR, I'll be glad. Thank you for submitting these PRs.

diff --git a/boards/xtensa/esp32/esp32-devkitc/configs/psram/defconfig b/boards/xtensa/esp32/esp32-devkitc/configs/psram/defconfig
index 9450b5475e..51de22702f 100644
--- a/boards/xtensa/esp32/esp32-devkitc/configs/psram/defconfig
+++ b/boards/xtensa/esp32/esp32-devkitc/configs/psram/defconfig
@@ -6,6 +6,7 @@
 # modifications.
 #
 # CONFIG_ARCH_LEDS is not set
+# CONFIG_NDEBUG is not set
 # CONFIG_NSH_ARGCAT is not set
 # CONFIG_NSH_CMDOPT_HEXDUMP is not set
 CONFIG_ARCH="xtensa"
@@ -15,6 +16,7 @@ CONFIG_ARCH_BOARD_ESP32_DEVKITC=y
 CONFIG_ARCH_CHIP="esp32"
 CONFIG_ARCH_CHIP_ESP32=y
 CONFIG_ARCH_CHIP_ESP32WROVER=y
+CONFIG_ARCH_INTERRUPTSTACK=2048
 CONFIG_ARCH_STACKDUMP=y
 CONFIG_ARCH_XTENSA=y
 CONFIG_BOARD_LOOPSPERMSEC=16717
@@ -36,7 +38,7 @@ CONFIG_IOB_NBUFFERS=36
 CONFIG_IOB_NCHAINS=36
 CONFIG_IOB_THROTTLE=8
 CONFIG_MM_IOB=y
-CONFIG_MM_REGIONS=4
+CONFIG_MM_REGIONS=5
 CONFIG_NSH_ARCHINIT=y
 CONFIG_NSH_BUILTIN_APPS=y
 CONFIG_NSH_FILEIOSIZE=512
@@ -48,6 +50,8 @@ CONFIG_RAM_START=0x20000000
 CONFIG_RR_INTERVAL=200
 CONFIG_SCHED_HPWORK=y
 CONFIG_SCHED_WAITPID=y
+CONFIG_SMP=y
+CONFIG_SMP_NCPUS=2
 CONFIG_START_DAY=6
 CONFIG_START_MONTH=12
 CONFIG_START_YEAR=2011
diff --git a/boards/xtensa/esp32/esp32-devkitc/configs/psram_usrheap/defconfig b/boards/xtensa/esp32/esp32-devkitc/configs/psram_usrheap/defconfig
index 171374f6b0..ea09820207 100644
--- a/boards/xtensa/esp32/esp32-devkitc/configs/psram_usrheap/defconfig
+++ b/boards/xtensa/esp32/esp32-devkitc/configs/psram_usrheap/defconfig
@@ -6,6 +6,7 @@
 # modifications.
 #
 # CONFIG_ARCH_LEDS is not set
+# CONFIG_NDEBUG is not set
 # CONFIG_NSH_ARGCAT is not set
 # CONFIG_NSH_CMDOPT_HEXDUMP is not set
 CONFIG_ARCH="xtensa"
@@ -15,6 +16,7 @@ CONFIG_ARCH_BOARD_ESP32_DEVKITC=y
 CONFIG_ARCH_CHIP="esp32"
 CONFIG_ARCH_CHIP_ESP32=y
 CONFIG_ARCH_CHIP_ESP32WROVER=y
+CONFIG_ARCH_INTERRUPTSTACK=2048
 CONFIG_ARCH_STACKDUMP=y
 CONFIG_ARCH_XTENSA=y
 CONFIG_BOARD_LOOPSPERMSEC=16717
@@ -38,7 +40,7 @@ CONFIG_IOB_NBUFFERS=36
 CONFIG_IOB_NCHAINS=36
 CONFIG_IOB_THROTTLE=8
 CONFIG_MM_IOB=y
-CONFIG_MM_REGIONS=3
+CONFIG_MM_REGIONS=4
 CONFIG_NSH_ARCHINIT=y
 CONFIG_NSH_BUILTIN_APPS=y
 CONFIG_NSH_FILEIOSIZE=512
@@ -50,6 +52,8 @@ CONFIG_RAM_START=0x20000000
 CONFIG_RR_INTERVAL=200
 CONFIG_SCHED_HPWORK=y
 CONFIG_SCHED_WAITPID=y
+CONFIG_SMP=y
+CONFIG_SMP_NCPUS=2
 CONFIG_START_DAY=6
 CONFIG_START_MONTH=12
 CONFIG_START_YEAR=2011

tmedicci avatar Aug 30 '24 16:08 tmedicci

I may have found a bug here. Let me debug it properly.

Confirmed. I'm sorry for the confusion. The tests regarding ESP32 + SMP without PSRAM failed because the change in https://github.com/apache/nuttx/pull/13249 is valid only when PSRAM is enabled, so we need to make it part of the general case. Can you commit the following to revert it and place it in the correct position:

Author: Tiago Medicci Serrano <[email protected]>
Date:   Fri Aug 30 15:11:59 2024 -0300

    esp32: fix initialization with PSRAM + SMP
    
    Cache flush must be done prior to the APP cpu initalization. This,
    however, must be true for the case where PSRAM is not available or
    not selected. To do that, this commit flushs the cache during the
    device initialization.

diff --git a/arch/xtensa/src/common/espressif/esp_loader.c b/arch/xtensa/src/common/espressif/esp_loader.c
index bf21c5f096..ebbf128044 100644
--- a/arch/xtensa/src/common/espressif/esp_loader.c
+++ b/arch/xtensa/src/common/espressif/esp_loader.c
@@ -273,8 +273,12 @@ int map_rom_segments(uint32_t app_drom_start, uint32_t app_drom_vaddr,
 #endif
 
 #ifdef CONFIG_ARCH_CHIP_ESP32
-  cache_read_disable(0);
-  cache_flush(0);
+  cache_read_disable(PRO_CPU_NUM);
+  cache_flush(PRO_CPU_NUM);
+#  ifdef CONFIG_SMP
+  cache_flush(APP_CPU_NUM);
+  cache_read_enable(APP_CPU_NUM);
+#  endif
 #else
   cache_hal_disable(CACHE_TYPE_ALL);
 #endif
diff --git a/arch/xtensa/src/esp32/esp32_spiram.c b/arch/xtensa/src/esp32/esp32_spiram.c
index eed50e1fff..7eaa5295ec 100644
--- a/arch/xtensa/src/esp32/esp32_spiram.c
+++ b/arch/xtensa/src/esp32/esp32_spiram.c
@@ -68,13 +68,6 @@
 #  error "FLASH speed can only be equal to or higher than SRAM speed while SRAM is enabled!"
 #endif
 
-/****************************************************************************
- * ROM Function Prototypes
- ****************************************************************************/
-
-extern void cache_flush(int cpu);
-extern void cache_read_enable(int cpu);
-
 /****************************************************************************
  * Private Data
  ****************************************************************************/
@@ -253,8 +246,6 @@ void IRAM_ATTR esp_spiram_init_cache(void)
   /* Flush and enable icache for APP CPU */
 
 #ifdef CONFIG_SMP
-  cache_flush(APP_CPU_NUM);
-  cache_read_enable(APP_CPU_NUM);
   regval  = getreg32(DPORT_APP_CACHE_CTRL1_REG);
   regval &= ~(1 << DPORT_APP_CACHE_MASK_DRAM1);
   putreg32(regval, DPORT_APP_CACHE_CTRL1_REG);

tmedicci avatar Aug 30 '24 20:08 tmedicci

Can you also enable SMP for the psram defconfig`, please?

why? how about psram_usrheap?

It's up to you ;) I would enable it too if it works smoothly...

done

yamt avatar Aug 31 '24 03:08 yamt

I may have found a bug here. Let me debug it properly.

Confirmed. I'm sorry for the confusion. The tests regarding ESP32 + SMP without PSRAM failed because the change in #13249 is valid only when PSRAM is enabled, so we need to make it part of the general case. Can you commit the following to revert it and place it in the correct position:

i created the commit in this PR. (is this what you meant? i'm not entirely sure why you didn't submit a separate PR by yourself.) when creating the commit, i tried to preserve metadata. but there might be some botches. it would be simpler to use git-format-patch next time.

btw, are these rom functions documented somewhere? i was just making wild guesses from their names.

yamt avatar Aug 31 '24 04:08 yamt

@tmedicci I confirmed that this PR fixes https://github.com/apache/nuttx/pull/13249#issuecomment-2322325239.

masayuki2009 avatar Sep 02 '24 08:09 masayuki2009