esp-idf icon indicating copy to clipboard operation
esp-idf copied to clipboard

[v5.1.2] esp_core_dump_get_summary leads to TG1WDT_SYS_RST (IDFGH-12595)

Open chipweinberger opened this issue 10 months ago • 3 comments

esp32s3 ESP-IDF v5.1-rc2-2-ga2c9879e34-dirty (from a ~month ago)

Problem:

My device gets stuck in esp_core_dump_get_summary, which then leads to a TG1WDT_SYS_RST

Debug Logs:

I added these debug logs to determine the problem.

it gets stuck at if (strncmp(nm, "EXTRA_INFO", note->n_namesz) == 0 ) {

esp_err_t esp_core_dump_get_summary(esp_core_dump_summary_t *summary)
{


    ESP_LOGI("TTTTT", "2A");

    int i;
    elf_phdr *ph;
    elf_note *note;
    const void *map_addr;
    size_t consumed_note_sz;
    esp_partition_mmap_handle_t core_data_handle;

    if (!summary) {
        return ESP_ERR_INVALID_ARG;
    }
    ESP_LOGI("TTTTT", "2B");
    esp_err_t err = elf_core_dump_image_mmap(&core_data_handle, &map_addr);
    if (err != ESP_OK) {
        return err;
    }
    uint8_t *ptr = (uint8_t *) map_addr + sizeof(core_dump_header_t);
    elfhdr *eh = (elfhdr *)ptr;
    ESP_LOGI("TTTTT", "2C");
    ESP_COREDUMP_LOGD("ELF ident %02x %c %c %c", eh->e_ident[0], eh->e_ident[1], eh->e_ident[2], eh->e_ident[3]);
    ESP_COREDUMP_LOGD("Ph_num %d offset %x", eh->e_phnum, eh->e_phoff);
    ESP_LOGI("TTTTT", "2D");
    for (i = 0; i < eh->e_phnum; i++) {

        ESP_LOGI("TTTTT", "2E");

        ph = (elf_phdr *)((ptr + i * sizeof(*ph)) + eh->e_phoff);
        ESP_COREDUMP_LOGD("PHDR type %d off %x vaddr %x paddr %x filesz %x memsz %x flags %x align %x",
                          ph->p_type, ph->p_offset, ph->p_vaddr, ph->p_paddr, ph->p_filesz, ph->p_memsz,
                          ph->p_flags, ph->p_align);

        if (ph->p_type == PT_NOTE) {

            ESP_LOGI("TTTTT", "2F");

            consumed_note_sz = 0;

            while(consumed_note_sz < ph->p_memsz) {

                ESP_LOGI("TTTTT", "2G");

                note = (elf_note *)(ptr + ph->p_offset + consumed_note_sz);

                char *nm = (char *)(ptr + ph->p_offset + consumed_note_sz + sizeof(elf_note));

                ESP_COREDUMP_LOGD("Note NameSZ %x DescSZ %x Type %x name %s", note->n_namesz,
                                  note->n_descsz, note->n_type, nm);

                ESP_LOGI("TTTTT", "2GGGG0");
                ESP_LOGI("TTTTT", "2GGGG0");

                ESP_LOGI("TTTTT", "ptr %p", ptr);
                ESP_LOGI("TTTTT", "ph->p_offset %lu", ph->p_offset);
                ESP_LOGI("TTTTT", "consumed_note_sz %u", consumed_note_sz);
                ESP_LOGI("TTTTT", "note %p", note);

               // THE ESP32S3 ALWAYS FREEZES HERE!!!!!!
               // EVEN IF I REMOVE THE LOGS
                
                ESP_LOGI("TTTTT", "note->n_namesz %lu", note->n_namesz);    
                ESP_LOGI("TTTTT", "nm %s", nm); 
                
                if (strncmp(nm, "EXTRA_INFO", note->n_namesz) == 0 ) {

                    ESP_LOGI("TTTTT", "2H");

                    esp_core_dump_summary_parse_extra_info(summary, (void *)(nm + note->n_namesz));
                }

                ESP_LOGI("TTTTT", "2G1");

Logs:

.........
I (8273) TTTTT: 2GGGG0
I (8273) TTTTT: 2GGGG0
I (8273) TTTTT: ptr 0x3c482014
I (8283) TTTTT: ph->p_offset 2100736
I (8283) TTTTT: consumed_note_sz 0
I (8283) TTTTT: note 0x3c682e14

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x8 (TG1WDT_SYS_RST),boot:0xa (SPI_FAST_FLASH_BOOT)
Saved PC:0x42006733
0x42006733: panic_handler at /Volumes/User/MBP-Google-Drive/jamcorder/firmware/esp-idf/components/esp_system/port/panic_handler.c:145 (discriminator 3)

SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3af0,len:0x238c
load:0x403c9700,len:0x4
load:0x403c9704,len:0x1ba4
load:0x403cc700,len:0x2f0c
entry 0x403c9a8c
I (26) boot: ESP-IDF v5.1-rc2-2-ga2c9879e34-dirty 2nd stage bootloader
I (26) boot: compile time Mar 11 2024 16:00:55
I (27) boot: Multicore bootloader
I (27) boot: chip revision: v0.1

chipweinberger avatar Apr 11 '24 23:04 chipweinberger

attempting to call any of these lines causes the device to freeze

            * ESP_LOGI("TTTTT", "note->n_namesz %lu", note->n_namesz);    
            * ESP_LOGI("TTTTT", "nm %s", nm); 
            * if (strncmp(nm, "EXTRA_INFO", note->n_namesz) == 0 )
            

but the values seem reasonable. note is 0x3c682e14

I (8273) TTTTT: ptr 0x3c482014
I (8283) TTTTT: ph->p_offset 2100736
I (8283) TTTTT: consumed_note_sz 0
I (8283) TTTTT: note 0x3c682e14

here are logs when it works, for comparison. Not much different.

I (8590) TTTTT: 2GGGG0
I (8590) TTTTT: 2GGGG0
I (8600) TTTTT: ptr 0x3c482014
I (8600) TTTTT: ph->p_offset 16676
I (8600) TTTTT: consumed_note_sz  0
I (8600) TTTTT: note 0x3c486138
I (8600) TTTTT: note->n_namesz 20
I (8600) TTTTT: nm ESP_CORE_DUMP_INFO

chipweinberger avatar Apr 11 '24 23:04 chipweinberger

Ah, there are also these logs which happen earlier, which I missed

I (4808) pd coredump extra: pd_has_new_coredump() called
E (4808) esp_core_dump_flash: Incorrect size of core dump image: 0
I (4808) pd coredump extra: esp_core_dump_image_check() - no valid coredump found. ESP_ERR_INVALID_SIZE

It seems I should honor the result of esp_core_dump_image_check()

chipweinberger avatar Apr 11 '24 23:04 chipweinberger

imo, we should validate during esp_core_dump_get_summary

i.e. call esp_core_dump_image_check within esp_core_dump_get_summary, & maybe cache the result of the check

chipweinberger avatar Apr 23 '24 07:04 chipweinberger

thanks for addressing it!

chipweinberger avatar May 16 '24 03:05 chipweinberger

imo, we should validate during esp_core_dump_get_summary

i.e. call esp_core_dump_image_check within esp_core_dump_get_summary, & maybe cache the result of the check

calling esp_core_dump_image_check manually and erasing the corrupted dump with esp_core_dump_image_erase in v5.3 is doing the job for me, thanks!

libreo-mwebert avatar Aug 07 '24 07:08 libreo-mwebert