ahoy icon indicating copy to clipboard operation
ahoy copied to clipboard

"Task watchdog" Reboot loop at startup with 0.8.140 and 0.8.141 / INT Pin status "unknown" for nRF24L01?

Open juepi opened this issue 1 year ago • 28 comments

Platform

ESP32

Assembly

I did the assembly by myself

nRF24L01+ Module

nRF24L01+ plus

Antenna

circuit board

Power Stabilization

Elko (~100uF)

Connection picture

  • [ ] I will attach/upload an image of my wiring

Version

0.8.140

Github Hash

f1f4481

Build & Flash Method

AhoyDTU Webinstaller

Setup

MQTT configured, Inverter interval 5 seconds, 2 inverters configured, one activated

Debug Serial Log output

No response

Error description

Really not sure if this is a problem with my infrastructure, but reporting it anyways in case it sounds familiar to someone: Hardware: ESP32-S3, using image *_opendtufusion.bin

was running fine on 0.8.130 for several weeks. Decided to go for a update to the new release 0.8.140, however directly after the upgrade (AhoyDTU /update page) i've experienced the following issues:

  • WebUI landing page seems to lose system / NTP time every few seconds
  • the enabled inverter on the landing page jumps from "producing" to "not available every few seconds
  • in the Live view, the inverter data is permanently changing from "greyed out" to "producting"
  • MQTT communication did not seem to work (configured power limit changes were not updated in AhoyDTU)

I would assume the corrupted system time to be the root cause for all other problems.

I've also tried the current dev build 0.8.141, same issue. I've downgraded my ESP to the 0.8.130 devbuild again, seems to work fine now.

yours, Juergen

juepi avatar Aug 20 '24 17:08 juepi

NTP, which server or ip is inserted? Your local Router?

rmayergfx avatar Aug 20 '24 17:08 rmayergfx

didnt change the default "pool.ntp.org" setting.

juepi avatar Aug 20 '24 19:08 juepi

give a try, insert ip from local router if the device will deploy time services. Better also if ISP is down.

rmayergfx avatar Aug 21 '24 07:08 rmayergfx

give a try, insert ip from local router if the device will deploy time services. Better also if ISP is down.

Did so, using IP address of my WiFi router. Works perfectly well with 0.8.130. After upgrading once again to 0.8.140, same issue again.

Web landing page keeps switching between image

and image

in irregular intervals (some seconds). Also uptime counter not working, even saw it count backwards once..

After some minutes, I've downgraded to 0.8.130 again. I saw the same behavior for a short time (maybe less than a minute) after booting the ESP with the old version, but it stopped and now ahoyDTU is running again without any issues as it seems.

yours, Juergen

juepi avatar Aug 21 '24 14:08 juepi

Save your Settings and then remove inverter #1 completely and try again with 0.8.141. be sure to use the right one for your board!

rmayergfx avatar Aug 22 '24 07:08 rmayergfx

Save your Settings and then remove inverter #1 completely and try again with 0.8.141. be sure to use the right one for your board!

Ok, will give it a try when i'm home, however everything works fine with 0.8.130? 🤔

P.s.: will fire up inverter #1 and enable it first, then upgrade to 0.8.141. Or do you expect an issue if more than one inverter is used?

juepi avatar Aug 22 '24 10:08 juepi

Update:

Starting after enabling Inverter #1 and rebooting with 0.8.130, everything is fine: image

After upgrading to 0.8.141, same issues start again. NTP time is fetched image

then after a few seconds, system time is lost again, which also breaks inverter communication: image

What i found interesting: in the error case, the Uptime counter never got higher than 10 seconds. After reaching 9-10 seconds, system time was lost and the uptime counter reset to a lower value of 4-5 seconds. I observed this buggy behavior once again for some minutes then downgraded to 0.8.130 again.

As i need both inverters to be working, i decided not to delete inverter #1 (if there's a plausible reason/suspicion to try to delete inv#1 please explain). I'm currently fine with the old version, if there's anything else i can do to help track down the problem let me know.

For the sake of completeness, i am using this ESP32-S3 board: ESP32-S3 WROOM-1-N16R8 ESP32-S3-DevKitC-1

NOTE: "System" page of Ahoy-DTU at version 0.8.130 reports WiFi RSSI of -71.

yours, Juergen

juepi avatar Aug 23 '24 07:08 juepi

hi, look under "System" to the reason of restarting your DTU. to me it looks like a reboot-loop whatever

Gubi2023 avatar Aug 23 '24 08:08 Gubi2023

Sounds plausible to me, will check this out.

juepi avatar Aug 23 '24 09:08 juepi

You were perfectly right! "System" reports this: image

I would assume the "unknown" status of the INT pin to be the reason for the problems, but as i'm writing this comment, the status suddenly changed: image

As you can see in the new screenshot, ESP seems to have stopped rebooting, everything seems to work fine now.

This is the pinout which i've configured for the NRF: "nrf":{"cs":37,"ce":38,"irq":47,"sclk":36,"mosi":35,"miso":48,"en":true}

When i manually issue a reboot on the ESP, the reboot loop once again starts and calms down after some time. I will stay on 0.8.141 for a while and see if it runs stable.

I have also attached a coredump file which was taken while the "reboot loops" were in progress, maybe it helps. 2024-08-23_11-17-09_v0.8.141_opendtufusion_coredump.bin.zip

juepi avatar Aug 23 '24 09:08 juepi

Updated issue title according to the new findings.

juepi avatar Aug 23 '24 09:08 juepi

can you try to download a coredump from system page? It would be really helpful to better understand what happens. It would be helpful if you can do that with .140 version, but once the last crash was with .140 you also can read it using .130 version.

lumapu avatar Aug 28 '24 19:08 lumapu

can you try to download a coredump from system page? It would be really helpful to better understand what happens. It would be helpful if you can do that with .140 version, but once the last crash was with .140 you also can read it using .130 version.

Hi Lukas, Already added a coredump 2 posts ago from 0.8.141, can you work with this one? My ESP32 has an uptime of nearly 6 days now with 0.8.141 and is running without any issue handling zero-export for 2 inverters (changing power-limits every 5 seconds through MQTT). As the system is live, i'd like to keep outages as low as possible 😉

Let me know if you still need the requested coredump from 0.8.140 and i'll create one.

yours, Juergen

juepi avatar Aug 29 '24 05:08 juepi

A small update from my side: i just had to reboot my AhoyDTU after a configuration change, it cam up instantly without the reboot-loop and Int-Pin working set to "true".

I have made the following changes due to Inverter1 being replaced from a HM-800 to HM-400:

  • Disabled Inverter1
  • changed S/N of Inverter1
  • Saved changes, reboot

EDIT: after another change (deleted Inverter1, reboot) the problem occured again, created another coredump: 2024-08-29_10-24-02_v0.8.141_opendtufusion_coredump.zip

So it seems that the reboot-loop does not occur on every reboot. Also, every reboot-loop occurence seems to end after a while (minutes) and after that, AhoyDTU is running perfectly well.

yours, Juergen

juepi avatar Aug 29 '24 07:08 juepi

Hey Juergen,

I translated your Coredumps:

2024-08-23_11-17-09_v0.8.141_opendtufusion_coredump.bin
===============================================================
==================== ESP32 CORE DUMP START ====================

Crashed task handle: 0x3fcf69a4, name: '', GDB name: 'process 1070557604'

================== CURRENT THREAD REGISTERS ===================
exccause       0x1d (StoreProhibitedCause)
excvaddr       0x0
epc1           0x42079715
epc2           0x0
epc3           0x0
epc4           0x0
epc5           0x0
epc6           0x0
eps2           0x0
eps3           0x0
eps4           0x0
eps5           0x0
eps6           0x0


==================== CURRENT THREAD STACK =====================
pc             0x40377da5          0x40377da5 <panic_abort+21>
lbeg           0x40056f5c          1074098012
lend           0x40056f72          1074098034
lcount         0x0                 0
sar            0x4                 4
ps             0x60821             395297
threadptr      <unavailable>
br             <unavailable>
scompare1      <unavailable>
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
expstate       <unavailable>
f64r_lo        <unavailable>
f64r_hi        <unavailable>
f64s           <unavailable>
fcr            <unavailable>
fsr            <unavailable>
a0             0x8037d104          -2143825660
a1             0x3fc96a90          1070164624
a2             0x3fc96afa          1070164730
a3             0x3fc96b27          1070164775
a4             0xa                 10
a5             0x35                53
a6             0x0                 0
a7             0x3fc96a35          1070164533
a8             0x0                 0
a9             0x1                 1
a10            0x3fc96ade          1070164702
a11            0x3fc96ade          1070164702
a12            0xa                 10
a13            0x0                 0
a14            0x2c973d0           46756816
a15            0xffffff            16777215

======================== THREADS INFO =========================
#0  0x40377da5 in panic_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
#1  0x4037d104 in esp_system_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
#2  0x40383c10 in abort () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/abort.c:46
#3  0x4204a303 in task_wdt_isr (arg=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:176
#4  0x40379478 in _xt_lowint1 () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1118
#5  0x420cb9c2 in cpu_ll_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32s3/include/hal/cpu_ll.h:182
#6  esp_pm_impl_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_pm/pm_impl.c:853
#7  0x4204ab74 in esp_vApplicationIdleHook () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/freertos_hooks.c:63
#8  0x4037e70b in prvIdleTask (pvParameters=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:4099
Retrying reading threads information...


       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x3fcf69a4                 1070556564/0           76/608
0x3fcf6f1c                 1070557964/0           84/608
0x3fceea00                 1070521840/18           88/672
0x3fcf359c                 1070539148/18          84/1168
0x3fcf7d4c                 1070560060/20           84/688
0x3fcf1750                 1070535488/24           88/608
0x3fcf622c                 1070551580/19           84/624
0x3fcf10ac                 1070533788/24           84/608
0x3fcb7b28                 1070343200/3        47280/624
0x3fcb788c                 1070336064/1        40812/672
0x3fcbfc08                 1070316536/10           80/640
0x3fcec150                 1070506304/1          88/1056
0x3fcb11ac                 1070266268/23           84/656
0x3fcf4d88                 1070545784/22           80/640

==================== THREAD 1 (TCB: 0x3fcf69a4, name: '') =====================


==================== THREAD 2 (TCB: 0x3fcf6f1c, name: '') =====================


==================== THREAD 3 (TCB: 0x3fceea00, name: '') =====================


==================== THREAD 4 (TCB: 0x3fcf359c, name: '') =====================


==================== THREAD 5 (TCB: 0x3fcf7d4c, name: '') =====================


==================== THREAD 6 (TCB: 0x3fcf1750, name: '') =====================


==================== THREAD 7 (TCB: 0x3fcf622c, name: '') =====================


==================== THREAD 8 (TCB: 0x3fcf10ac, name: '') =====================


==================== THREAD 9 (TCB: 0x3fcb7b28, name: '') =====================


==================== THREAD 10 (TCB: 0x3fcb788c, name: '') =====================


==================== THREAD 11 (TCB: 0x3fcbfc08, name: '') =====================


==================== THREAD 12 (TCB: 0x3fcec150, name: '') =====================


==================== THREAD 13 (TCB: 0x3fcb11ac, name: '') =====================


==================== THREAD 14 (TCB: 0x3fcf4d88, name: '') =====================



======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x600fe000 0x0 RW
.rtc.dummy 0x600fe000 0x0 RW
.rtc.force_fast 0x600fe000 0x0 RW
.rtc.force_slow 0x50000010 0x0 RW
.iram0.vectors 0x40374000 0x403 R XA
.iram0.text 0x40374404 0x1138f R XA
.dram0.data 0x3fc957a0 0x57d0 RW A
.noinit 0x3fc9af70 0x0 RW
.flash.text 0x42000020 0xd10c7 R XA
.flash.appdesc 0x3c0e0020 0x100 R  A
.flash.rodata 0x3c0e0120 0x47a4c RW A
.iram0.text_end 0x40385793 0x0 RW
.iram0.bss 0x40385794 0x0 RW
.dram0.heap_start 0x3fcae2a0 0x0 RW
.coredump.tasks.data 0x3fcf69a4 0x158 RW
.coredump.tasks.data 0x3fcf6730 0x260 RW
.coredump.tasks.data 0x3fcf6f1c 0x158 RW
.coredump.tasks.data 0x3fcf6ca0 0x260 RW
.coredump.tasks.data 0x3fceea00 0x158 RW
.coredump.tasks.data 0x3fcee740 0x2a0 RW
.coredump.tasks.data 0x3fcf359c 0x158 RW
.coredump.tasks.data 0x3fcf30f0 0x490 RW
.coredump.tasks.data 0x3fcf7d4c 0x158 RW
.coredump.tasks.data 0x3fcf7a80 0x2b0 RW
.coredump.tasks.data 0x3fcf1750 0x158 RW
.coredump.tasks.data 0x3fcf14d0 0x260 RW
.coredump.tasks.data 0x3fcf622c 0x158 RW
.coredump.tasks.data 0x3fcf5fa0 0x270 RW
.coredump.tasks.data 0x3fcf10ac 0x158 RW
.coredump.tasks.data 0x3fcf0e30 0x260 RW
.coredump.tasks.data 0x3fcb7b28 0x158 RW
.coredump.tasks.data 0x3fcc31a0 0x270 RW
.coredump.tasks.data 0x3fcb788c 0x158 RW
.coredump.tasks.data 0x3fcc1590 0x2a0 RW
.coredump.tasks.data 0x3fcbfc08 0x158 RW
.coredump.tasks.data 0x3fcbf970 0x280 RW
.coredump.tasks.data 0x3fcec150 0x158 RW
.coredump.tasks.data 0x3fcebd10 0x420 RW
.coredump.tasks.data 0x3fcb11ac 0x158 RW
.coredump.tasks.data 0x3fcb0f00 0x290 RW
.coredump.tasks.data 0x3fcf4d88 0x158 RW
.coredump.tasks.data 0x3fcf4af0 0x280 RW

===================== ESP32 CORE DUMP END =====================
===============================================================
2024-08-29_10-24-02_v0.8.141_opendtufusion_coredump.bin
===============================================================
==================== ESP32 CORE DUMP START ====================

Crashed task handle: 0x3fcf69a4, name: '', GDB name: 'process 1070557604'

================== CURRENT THREAD REGISTERS ===================
exccause       0x1d (StoreProhibitedCause)
excvaddr       0x0
epc1           0x42079715
epc2           0x0
epc3           0x0
epc4           0x0
epc5           0x0
epc6           0x0
eps2           0x0
eps3           0x0
eps4           0x0
eps5           0x0
eps6           0x0


==================== CURRENT THREAD STACK =====================
pc             0x40377da5          0x40377da5 <panic_abort+21>
lbeg           0x40056f5c          1074098012
lend           0x40056f72          1074098034
lcount         0x0                 0
sar            0x4                 4
ps             0x60e21             396833
threadptr      <unavailable>
br             <unavailable>
scompare1      <unavailable>
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
expstate       <unavailable>
f64r_lo        <unavailable>
f64r_hi        <unavailable>
f64s           <unavailable>
fcr            <unavailable>
fsr            <unavailable>
a0             0x8037d104          -2143825660
a1             0x3fc96a90          1070164624
a2             0x3fc96afa          1070164730
a3             0x3fc96b27          1070164775
a4             0xa                 10
a5             0x32                50
a6             0x0                 0
a7             0x3fc96a35          1070164533
a8             0x0                 0
a9             0x1                 1
a10            0x3fc96ade          1070164702
a11            0x3fc96ade          1070164702
a12            0xa                 10
a13            0x0                 0
a14            0x2c973d0           46756816
a15            0xffffff            16777215

======================== THREADS INFO =========================
#0  0x40377da5 in panic_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
#1  0x4037d104 in esp_system_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
#2  0x40383c10 in abort () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/abort.c:46
#3  0x4204a303 in task_wdt_isr (arg=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:176
#4  0x40379478 in _xt_lowint1 () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1118
#5  0x420cb9c2 in cpu_ll_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32s3/include/hal/cpu_ll.h:182
#6  esp_pm_impl_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_pm/pm_impl.c:853
#7  0x4204ab74 in esp_vApplicationIdleHook () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/freertos_hooks.c:63
#8  0x4037e70b in prvIdleTask (pvParameters=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:4099
Retrying reading threads information...


       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x3fcf69a4                 1070556564/0           76/608
0x3fcf6f1c                 1070557964/0           84/608
0x3fcf359c                 1070539148/18          84/1168
0x3fceea00                 1070521840/18           88/672
0x3fcf7d4c                 1070560060/20           84/688
0x3fcf1750                 1070535488/24           88/608
0x3fcf622c                 1070551580/19           84/624
0x3fcc1834                 1070342776/3         6660/624
0x3fcb788c                 1070336036/1        40796/672
0x3fcf10ac                 1070533788/24           84/608
0x3fcbfbec                 1070316508/10           84/640
0x3fcec150                 1070506304/1          88/1056
0x3fcb11ac                 1070266268/23           84/656
0x3fcf4d88                 1070545784/22           80/640

==================== THREAD 1 (TCB: 0x3fcf69a4, name: '') =====================


==================== THREAD 2 (TCB: 0x3fcf6f1c, name: '') =====================


==================== THREAD 3 (TCB: 0x3fcf359c, name: '') =====================


==================== THREAD 4 (TCB: 0x3fceea00, name: '') =====================


==================== THREAD 5 (TCB: 0x3fcf7d4c, name: '') =====================


==================== THREAD 6 (TCB: 0x3fcf1750, name: '') =====================


==================== THREAD 7 (TCB: 0x3fcf622c, name: '') =====================


==================== THREAD 8 (TCB: 0x3fcc1834, name: '') =====================


==================== THREAD 9 (TCB: 0x3fcb788c, name: '') =====================


==================== THREAD 10 (TCB: 0x3fcf10ac, name: '') =====================


==================== THREAD 11 (TCB: 0x3fcbfbec, name: '') =====================


==================== THREAD 12 (TCB: 0x3fcec150, name: '') =====================


==================== THREAD 13 (TCB: 0x3fcb11ac, name: '') =====================


==================== THREAD 14 (TCB: 0x3fcf4d88, name: '') =====================



======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x600fe000 0x0 RW
.rtc.dummy 0x600fe000 0x0 RW
.rtc.force_fast 0x600fe000 0x0 RW
.rtc.force_slow 0x50000010 0x0 RW
.iram0.vectors 0x40374000 0x403 R XA
.iram0.text 0x40374404 0x1138f R XA
.dram0.data 0x3fc957a0 0x57d0 RW A
.noinit 0x3fc9af70 0x0 RW
.flash.text 0x42000020 0xd10c7 R XA
.flash.appdesc 0x3c0e0020 0x100 R  A
.flash.rodata 0x3c0e0120 0x47a4c RW A
.iram0.text_end 0x40385793 0x0 RW
.iram0.bss 0x40385794 0x0 RW
.dram0.heap_start 0x3fcae2a0 0x0 RW
.coredump.tasks.data 0x3fcf69a4 0x158 RW
.coredump.tasks.data 0x3fcf6730 0x260 RW
.coredump.tasks.data 0x3fcf6f1c 0x158 RW
.coredump.tasks.data 0x3fcf6ca0 0x260 RW
.coredump.tasks.data 0x3fcf359c 0x158 RW
.coredump.tasks.data 0x3fcf30f0 0x490 RW
.coredump.tasks.data 0x3fceea00 0x158 RW
.coredump.tasks.data 0x3fcee740 0x2a0 RW
.coredump.tasks.data 0x3fcf7d4c 0x158 RW
.coredump.tasks.data 0x3fcf7a80 0x2b0 RW
.coredump.tasks.data 0x3fcf1750 0x158 RW
.coredump.tasks.data 0x3fcf14d0 0x260 RW
.coredump.tasks.data 0x3fcf622c 0x158 RW
.coredump.tasks.data 0x3fcf5fa0 0x270 RW
.coredump.tasks.data 0x3fcc1834 0x158 RW
.coredump.tasks.data 0x3fcc3000 0x270 RW
.coredump.tasks.data 0x3fcb788c 0x158 RW
.coredump.tasks.data 0x3fcc1580 0x2a0 RW
.coredump.tasks.data 0x3fcf10ac 0x158 RW
.coredump.tasks.data 0x3fcf0e30 0x260 RW
.coredump.tasks.data 0x3fcbfbec 0x158 RW
.coredump.tasks.data 0x3fcbf950 0x280 RW
.coredump.tasks.data 0x3fcec150 0x158 RW
.coredump.tasks.data 0x3fcebd10 0x420 RW
.coredump.tasks.data 0x3fcb11ac 0x158 RW
.coredump.tasks.data 0x3fcb0f00 0x290 RW
.coredump.tasks.data 0x3fcf4d88 0x158 RW
.coredump.tasks.data 0x3fcf4af0 0x280 RW

===================== ESP32 CORE DUMP END =====================
===============================================================

Both show the same behavior, the 🐶 is not fed. But I still don't know where it needs to be fed more. As I saw you only have configured NRF. There is no CMT and no MqTT.

What is about display, is it configured?

Does your ESP crash by itself or only if the WebUI is open?

lumapu avatar Aug 29 '24 21:08 lumapu

Hi Lukas,

Both show the same behavior, the 🐶 is not fed. But I still don't know where it needs to be fed more. As I saw you only have configured NRF. There is no CMT and no MqTT.

Yes, MQTT is used, but at the stage where the coredumps have been downloaded, it did not yet work. EDIT: see first screenshot here

What is about display, is it configured?

No display connected, only the NRF radio.

Does your ESP crash by itself or only if the WebUI is open?

Uh, hard to tell. What i can tell for sure is that it only happens on startup/reboot. As soon as it switches into a "stable mode", it seems to run perfectly well (at least for a week as far as i can tell by now).

yours, Juergen

P.s.: coredump added in working state - maybe it helps? 2024-08-29_23-17-10_v0.8.141_opendtufusion_coredump.zip Note: Inverter0 is offline, Inverter1 is powered down in this coredump (battery drained).

juepi avatar Aug 29 '24 21:08 juepi

One more thing concerning your question about accessing the WebUI: if you're thinking of the new AsyncWebserver of 0.8.141 causing the issue, i also had the same problem with 0.8.140.

yours, Juergen

juepi avatar Aug 30 '24 06:08 juepi

lt. deinem Screenshot hat du ein sehr hohen MqTT-Verkehr: fast 3000 Tx in 4 min! Ist das normal, oder kann sich da die DTU verschlucken?
360873649-63d9f2d2-74cc-438c-8441-a7cbdfdea1ee

Gubi2023 avatar Aug 30 '24 07:08 Gubi2023

lt. deinem Screenshot hat du ein sehr hohen MqTT-Verkehr: fast 3000 Tx in 4 min! Ist das normal, oder kann sich da die DTU verschlucken?

Also es ist etwas über meinem Durchschnitt, über die letzten 24h komme ich auf etwa 400TX/min, wobei in der Nacht die Inverter deaktiviert waren, so gesehen wäre das mit ca. 600 TX/min schon plausibel. habe ein MQTT- und Inverter-Intervall von 5 Sekunden konfiguriert, das entspricht dem Datenintervall meines SmartMeters. Nulleinspeiseregelung läuft über FHEM (Perl-script) und wird per MQTT an AhoyDTU geliefert (limits), was in diesem Setup sehr gut funktioniert.

Wie gesagt: sobald das Werkl mal läuft nach der "startup reboot loop" läuft das sehr stabil. Ging sogar mit ESP8266, dort am Ende allerdings mit Stabilitätsproblemen, deswegen der Wechsel auf ESP32-S3.

lg, Jürgen

juepi avatar Aug 30 '24 07:08 juepi

leider zeigt auch der dritte Coredump das gleiche Bild. Kannst du mal testweise das MqTT intervall auf 0 setzen, d.h. nicht, dass keine Daten geliefert werden, sondern immer dann wenn neue zur Verfügung stehen.

lumapu avatar Aug 30 '24 20:08 lumapu

Wollte ich gerade umstellen - steht schon auf 0! Shame on me, sorry für die Fehlinformation!

juepi avatar Aug 31 '24 05:08 juepi

Morgen,

Habe gerade ein interessantes Verhalten festgestellt: nach dem reboot (inkl. MQTT broker) meines Servers (WiFI inkl. NTP und Namensauflösung blieb online) tritt das gleiche Verhalten auf! AohyDTU geht in die reboot-schleife und "erholt" sich wieder nach einiger Zeit..

Hier nochmal ein Dump: 2024-09-01_09-52-34_v0.8.141_opendtufusion_coredump.zip

Dieser entstand nach einer "MQTT broker offline reboot schleife" (zu dem Zeitpunkt war der Broker aber schon wieder online und AhoyDTU hat sich erholt).

Eventuell ein buffer-overflow nach dem boot, weil MQTT messages zum senden anstehen aber der broker noch nicht connected hat?

lg, Jürgen

juepi avatar Sep 01 '24 07:09 juepi

sieht sehr danach aus woran wir alle gerade knappern MqTT läuft voll und bekommt die Pakete nicht abgeschickt, Ahoy bricht dann zusammen, da kein Speicher mehr da ist

scheint dasselbe Problem zu sein....

Gubi2023 avatar Sep 01 '24 08:09 Gubi2023

Aha, spannend finde ich dass das bei mir offenbar nur beim booten zum Problem wird. Wie gesagt, einmal im Betrieb läuft das wie ein Glöckerl.

image Selbst bei hohem Regelaufwand keinerlei Probleme.

lg, Jürgen

juepi avatar Sep 01 '24 17:09 juepi

also bei mir ist nach spätestens 24h Schluss, dann restartet sich die DTU mit Task Watchdog. Leider verabschiedet sich auch mein ESP8266 mit -min-Konfig immer wieder mit "Exception"

Gubi2023 avatar Sep 01 '24 17:09 Gubi2023

also bei mir ist nach spätestens 24h Schluss, dann restartet sich die DTU mit Task Watchdog

nein, die 0.8.141 lief bei mir bereits fast 1 Woche durch ohne Probleme, es ist tatsächlich immer nur der boot der (1-2 Minuten) hunzt, oder eben wenn der MqTT Broker ausfällt.

lg, Jürgen

juepi avatar Sep 01 '24 17:09 juepi

Also gemittelt über die letzten 3 tage habe ich ca. 560 MqTT TX pro Minute. Keine resets von AhoyDTU in dieser Zeit.

lg, Jürgen

juepi avatar Sep 04 '24 05:09 juepi

Update meinerseits: habe gerade auf die 0.8.150 aktualisiert, beide Probleme (reboot loop beim startup als auch "INT pin status unknown") treten bei mir nicht mehr auf, danke Lukas! 👍

Update: I've just upgraded to 0.8.150, both described problems (reboot loop at firmware startup and the unknown "INT pin status") do not occur any longer, thanks Lukas!

lg, Jürgen

P.S.: 0.8.151 also works without the mentioned problems 😉

juepi avatar Oct 03 '24 15:10 juepi