diyBMSv4ESP32
diyBMSv4ESP32 copied to clipboard
WIFI Issue - random reboot when STA is lost (investigation)
This ticket is to investigate seemingly random reboots of the controller (often related to also losing WIFI STA) with latest firmware version Release-2023-12-27-12-02
May be related to #239
Test on my development rig:
- no sd card
- no TFT display
- V4.2 PCB controller + current shunt addon
- 6 cells being monitoring (v4.90 board)
- ESP32 reported at boot up:
ESP32 Chip model = 1, Rev 1, Cores=2, Features=50
- Powered from USB cable into ESP32 (from desktop PC)
- MQTT enabled, INFLUX disabled, Home Assistant API not used
- MQTT broker configured for mqtt://test.mosquitto.org:1884, port 1884, username:
rw
, password:readwrite
ESP32 connected to WIFI hot spot on mobile phone (Android). On boot up, controller report (filtered for wifi events only, logging for MQTT increased to DEBUG level)
D (6469) diybms: starting wifi_init_sta
I (6492) diybms: WIFI SSID: XXXXXXXXXXXXXXXX
I (6569) diybms: Hostname: DIYBMS-005CED90
D (6570) diybms: wifi_init_sta finished
I (13606) diybms: WIFI_EVENT_STA_START
D (13707) diybms: total_free_byte=156976 total_allocated_byte=132724 largest_free_blk=110580 min_free_byte=154632 alloc_blk=360 free_blk=5 total_blk=365
I (15392) diybms: WIFI_EVENT_STA_DISCONNECTED
I (15395) diybms: WIFI connect quick retry 1
I (17809) diybms: WIFI_EVENT_STA_DISCONNECTED
I (17812) diybms: WIFI connect quick retry 2
I (17941) diybms: WIFI_EVENT_STA_CONNECTED channel=11, rssi=-41
I (17966) diybms: IP ADDRESS HAS CHANGED
I (17969) diybms: Request time from time.google.com
I (17970) diybms: Timezone=UTC0DST
I (17971) diybms: The current date/time is: Thu Jan 1 00:00:10 1970
I (17996) diybms: You can access DIYBMS interface at http://DIYBMS-005CED90.local or http://192.168.1.87
W (18512) diybms-mqtt: MQTT enabled, but not yet init
W (19608) diybms-mqtt: MQTT enabled, but not yet init
I (43745) diybms-mqtt: MQTT counters: Err_Con=0,Err_Trans=0,Conn=0,Disc=0
I (43746) diybms-mqtt: esp_mqtt_client_init
I (43750) diybms-mqtt: esp_mqtt_client_start
I (44254) diybms-mqtt: MQTT_EVENT_CONNECTED
I (46619) diybms-mqtt: Rule status payload
D (46627) diybms-mqtt: Topic:emon/diybms2/rule, ID:0, Length:103
I (46628) diybms-mqtt: Outputs status payload
D (46634) diybms-mqtt: Topic:emon/diybms2/output, ID:0, Length:25
I (48542) diybms-mqtt: MQTT Payload for cell data
Data is successfully transmitted to MQTT server and web interface is working as expected.
Upon terminating the WIFI hot spot on the Android phone:
I (284130) diybms: WIFI_EVENT_STA_DISCONNECTED
E (284132) TRANSPORT_BASE: poll_read select error 113, errno = Software caused connection abort, fd = 51
E (284133) MQTT_CLIENT: Poll read error: 119, aborting connection
I (284140) diybms-mqtt: MQTT_EVENT_DISCONNECTED
I (284207) diybms-mqtt: MQTT counters: Err_Con=0,Err_Trans=0,Conn=1,Disc=1
I (284233) diybms-mqtt: Stopping MQTT client
W (286282) diybms-mqtt: MQTT enabled, but not connected
W (289710) diybms-mqtt: MQTT enabled, but not connected
W (291285) diybms-mqtt: MQTT enabled, but not connected
W (291285) diybms-mqtt: MQTT enabled, but not connected
W (291286) diybms-mqtt: MQTT enabled, but not connected
I (299155) diybms: WIFI connect quick retry 1
W (301288) diybms-mqtt: MQTT enabled, but not yet init
I (301569) diybms: WIFI_EVENT_STA_DISCONNECTED
I (301571) diybms: WIFI connect quick retry 2
I (301709) diybms-rules: Set error 2:ModuleCountMismatch
I (301710) diybms: Active errors=1
W (301711) diybms-mqtt: MQTT enabled, but not yet init
I (303985) diybms: WIFI_EVENT_STA_DISCONNECTED
I (303988) diybms: WIFI connect quick retry 3
** removed similar messages **
I (313650) diybms: WIFI_EVENT_STA_DISCONNECTED
I (313653) diybms: WIFI connect quick retry 7
I (313713) diybms-rules: Set error 2:ModuleCountMismatch
I (313714) diybms: Active errors=1
W (313715) diybms-mqtt: MQTT enabled, but not yet init
I (314266) diybms: Trying to connect WIFI
E (314267) wifi:sta is connecting, return error
ESP_ERROR_CHECK_WITHOUT_ABORT failed: esp_err_t 0x3007 (ESP_ERR_WIFI_CONN) at 0x4008ea0b
file: "src/main.cpp" line 4187
func: void loop()
expression: esp_wifi_connect()
I (316066) diybms: WIFI_EVENT_STA_DISCONNECTED
** removed similar messages **
I (554684) diybms: Trying to connect WIFI
I (436909) diybms: WIFI_EVENT_STA_DISCONNECTED
E (436910) diybms: Connect to WIFI AP failed, tried 28 times
Upon re-enabling the WIFI hot spot on the Android phone:
I (765022) diybms: Trying to connect WIFI
I (765122) diybms: WIFI_EVENT_STA_CONNECTED channel=11, rssi=-48
I (765150) diybms: IP ADDRESS HAS CHANGED
I (765150) diybms: Request time from time.google.com
I (765151) diybms: Timezone=UTC0DST
I (765152) diybms: The current date/time is: Tue Feb 6 11:56:59 2024
I (765174) diybms: You can access DIYBMS interface at http://DIYBMS-005CED90.local or http://192.168.1.87
I (795081) diybms-mqtt: MQTT counters: Err_Con=0,Err_Trans=0,Conn=0,Disc=0
I (795081) diybms-mqtt: esp_mqtt_client_init
I (795086) diybms-mqtt: esp_mqtt_client_start
I (795276) diybms-mqtt: MQTT_EVENT_CONNECTED
The code in the controller is designed for the following action when a loss of WIFI is detected (event WIFI_EVENT_STA_DISCONNECTED
)
- Calls ShutdownAllNetworkServices() stop_webserver / stopMqtt / stopMDNS
- Set wifi_isconnected = false
- Attempts to call esp_wifi_connect() up to 25 times - log messages reported as "connect quick retry"
After 25 times, the message reported is Connect to WIFI AP failed, tried XXX times
.
Once the 25 attempts have failed, esp_wifi_connect() is called inside the main loop, approx. every 30 seconds, reported as "Trying to connect WIFI"
As can be seen from the above logs, the development rig environment as described appears to work correctly and recovers from WIFI disconnection and errors successfully.
Related to #220
Ok, managed to get a GURU if I repeat disable wifi hotspot and quickly re-enable it.
I (1759137) diybms: WIFI connect quick retry 1
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x401b5f3e PS : 0x00060a30 A0 : 0x801b6023 A1 : 0x3ffd8f00
A2 : 0x3ffb62d4 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff
A6 : 0x00000000 A7 : 0x3ffe3458 A8 : 0x3ffdae70 A9 : 0x3ffd8e70
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x3ffe2928 A13 : 0x3ffe2928
A14 : 0x3ffe3428 A15 : 0x3ffe3462 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x4008c0e1 LEND : 0x4008c0f1 LCOUNT : 0xfffffffe
Backtrace: 0x401b5f3b:0x3ffd8f00 0x401b6020:0x3ffd8f50
#0 0x401b5f3b:0x3ffd8f00 in handler_execute at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:139
(inlined by) esp_event_loop_run at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:590
#1 0x401b6020:0x3ffd8f50 in esp_event_loop_run_task at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:115 (discriminator 15)
Possible fix firmware (experimental) diybms_controller_firmware_experimental_bug276.zip
Hello Stuart, I also noticed that the DIYBMS (Firmware 2023-11-28) was restarting. It seems to have restarted 3 times in a very short time. Unfortunately, I cannot yet say whether this is related to the WLAN. I will try to do tests with WLAN until the end of the week.
I could see from the uptime of the controller that it has really restarted.
It seems to have restarted 3 times in a very short time.
It seems to trigger a reboot if the WIFI connection is lost and restored within a second or two, but it looks like a bug in the controller code (as expected!) so I'm hoping this version works as expected.
A few days ago I also observed an internal BMS error, which is really strange. I have never seen such errors before. I have been using the system for over a year without ever seeing anything like this. It may be important for the analysis
the experimental firmware does not start on my esp, black screen. tried with two different esp32 and two different computers
the experimental firmware does not start on my esp, black screen. tried with two different esp32 and two different computers
This isn't a complete flash image - if you re-flash the "release" version, then use the over the air upgrade feature to apply this experimental one.
ok now it is running. disconnected wifi several times, no reboot. now i need to wait some days and watch how my inverter behaves
now it is running for two days no issues so far. but i noticed that the controller refuses to connect to network with hidden ssid, this was possible with december firmware but the reconnect problem was there even if the wifi ssid was not hidden.
if this is the trade off for a stable running controller i can live with it, maby not for all user?
I've not made any changes to the wifi stack - so a hidden SSID shouldn't be a problem.
I've a log file from another user who has tested this firmware and unfortunately it didn't solve his reboot. He uses a Fritzbox which does appear to be a common problem with ESP32 hardware.
CONTROLLER - ver:cbe2f3314cf6ac9e3db3e1cdb27aa386e6facbcc compiled 2024-02-06T12:40:00.542Z
ESP32 Chip model = 1, Rev 1, Cores=2, Features=50
I (245621) diybms: WIFI_EVENT_STA_DISCONNECTED
I (245621) diybms: ShutdownAllNetworkServices
I (245621) diybms-web: httpd_stop
I (245722) diybms: stop mdns
I (245734) diybms: WIFI connect quick retry 1
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x401b5f92 PS : 0x00060030 A0 : 0x801b6077 A1 : 0x3ffd8da0
A2 : 0x3ffb6328 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff
A6 : 0x00000000 A7 : 0x3ffe2fc8 A8 : 0x3ffdad40 A9 : 0x3ffd8d10
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x3ffe2438 A13 : 0x3ffe2438
A14 : 0x3ffe2f98 A15 : 0x3ffe2fd2 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x4008c0e1 LEND : 0x4008c0f1 LCOUNT : 0xfffffffe
Backtrace: 0x401b5f8f:0x3ffd8da0 0x401b6074:0x3ffd8df0
which decodes as
0x401b5f92: handler_execute at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:145
0x401b5f92: esp_event_loop_run at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:590
0x401b5f8f: handler_execute at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:139
0x401b5f8f: esp_event_loop_run at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:590
0x401b6074: esp_event_loop_run_task at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:115
Hi Stuart Do you think an ESP32 with Ethernet port can solve the pb ?
Le ven. 9 févr. 2024 à 10:24, Stuart Pittaway @.***> a écrit :
I've not made any changes to the wifi stack - so a hidden SSID shouldn't be a problem.
I've a log file from another user who has tested this firmware and unfortunately it didn't solve his reboot. He uses a Fritzbox which does appear to be a common problem with ESP32 hardware.
CONTROLLER - ver:cbe2f3314cf6ac9e3db3e1cdb27aa386e6facbcc compiled 2024-02-06T12:40:00.542Z ESP32 Chip model = 1, Rev 1, Cores=2, Features=50
I (245621) diybms: WIFI_EVENT_STA_DISCONNECTED I (245621) diybms: ShutdownAllNetworkServices I (245621) diybms-web: httpd_stop I (245722) diybms: stop mdns I (245734) diybms: WIFI connect quick retry 1 Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump: PC : 0x401b5f92 PS : 0x00060030 A0 : 0x801b6077 A1 : 0x3ffd8da0 A2 : 0x3ffb6328 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff A6 : 0x00000000 A7 : 0x3ffe2fc8 A8 : 0x3ffdad40 A9 : 0x3ffd8d10 A10 : 0x00000000 A11 : 0x00000001 A12 : 0x3ffe2438 A13 : 0x3ffe2438 A14 : 0x3ffe2f98 A15 : 0x3ffe2fd2 SAR : 0x00000004 EXCCAUSE: 0x0000001c EXCVADDR: 0x00000000 LBEG : 0x4008c0e1 LEND : 0x4008c0f1 LCOUNT : 0xfffffffe
Backtrace: 0x401b5f8f:0x3ffd8da0 0x401b6074:0x3ffd8df0
— Reply to this email directly, view it on GitHub https://github.com/stuartpittaway/diyBMSv4ESP32/issues/276#issuecomment-1935585625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYDJ6M2NQHTGUZJ5ENNVAT3YSXTLFAVCNFSM6AAAAABC32IJ2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGU4DKNRSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Do you think an ESP32 with Ethernet port can solve the pb ?
No idea, I don't have one and it would also need significant code changes to make it work
YES! LAN is the solution!!! ;-)
I've a log file from another user who has tested this firmware and unfortunately it didn't solve his reboot. He uses a Fritzbox which does appear to be a common problem with ESP32 hardware.
have he tested this with other esp32?
well i dont have a fritzbox, but i had also to replace my wifi router because some esp32 have not connected to my previous one...
Sadly i have no logs, but also a Fritz!Box and the same issues.
Sadly i have no logs, but also a Fritz!Box and the same issues.
try to make a wifi hotspot on your phone and connect to that. if it will not reboot so the fritzbox is the issue
@red0909
well i dont have a fritzbox, but i had also to replace my wifi router because some esp32 have not connected to my previous one...<
Which other router you bought?
@red0909
try to make a wifi hotspot on your phone and connect to that. if it will not reboot so the fritzbox is the issue
The Hotspot on Iphone is not the right way for testing. It only shares the Internet with a connected WiFi subscriber. It does not create an internal network that can be accessed. Calling the web app seems to be a possible source of the problem - possibly in connection with MQTT. I tried it.
Yesterday I switched off the Fritz!Box WiFi and tested a TP-Link Accesspoint(TL-WR841N). There was still a problem with ESP32 crashing. Interesting thing....with the Fritz!WLAN Repeater, the crashes usually occurred after the WiFi was switched off. With TP-Link, the crashes now happen when you turn on the WiFi... and only after you open the web app.
It only shares the Internet with a connected WiFi subscriber. It does not create an internal network that can be accessed.
You can access the DIYBMS web interface directly from the phone web browser, when testing in this fashion.
Which other router you bought?
dlink dsr-250n
it need some tricky fw updates 5 times to the new fw but this router is not longer supported and should not be for internet use.
i use it offline my network for my inverters and this bms is offline. could use only a 8 port switch but the diy bms require wifi, its the only device in my network using wifi. i dont trust wifi for critical devices, the diy bms needs a password too or at least a simple 4digit pin.
@stuartpittaway i disconnect wifi sometimes to see what happens. this experimental fw still running good, no reboots here. on a cheap fake esp32
@red0909
... i dont trust wifi for critical devices, ...
It's the same with me. diyBMS is the only device on my network without LAN :-( Our WiFi is switched off from 9 p.m. to 6 a.m. Then the most important data from the diyBMS comes from the Victron Cerbo GX via the battery Can-Bus.
the diy bms needs a password too or at least a simple 4digit pin.
Security isn't really possible on these sort of devices (ESP32) - at least not without a full TLS encryption layer/certificates - otherwise any sort of password or PIN is pointless as they could be sniffed off the network.
so 14 days now with experimental fw, no reboot no problems with the wifi.
Hi @red0909 been 3 weeks now, whats the feedback?
no problems as far i can see, but i am not using mqtt or homeasistant. running stable no reboot with cheap esp32 module canbus signal is stable too
Hello Stuart,
I installed a DIYBMS a few days ago. A controller board v4.5 on a 18s1p battery.
I have installed the last 4 official releases on the controller and whenever the Fritzbox was rebooted or the wifi was turned off. The controller board is restarted.
I then installed the beta "diybms_controller_firmware_experimental_bug276.zip" and the problem was gone. I must have restarted the Fritzbox 2-3 times without a problem.
Today the power was probably off for about 1h during the installation. So the Fritzbox was off and the controller board restarted. I was able to determine this through the uptime and also the undervoltage error (relay dropped out briefly).
The DIYBMS is connected to the router as follows (MESH is active): Fritzbox <--> Repeater 1750e <--> DIYBMS
Unfortunately I have no access to the serial console of the controller
Nobody wants to hear that here, sorry.
@jetronic18s what powersupply do you have for the controller? have you measured the voltage at the controller screws? i think this is some sort of a power issue
@red0909 I supply the controller via a DCDC (Mean Well DDR-30L-5) from the battery. I have exactly 5V in idle mode. In the event of a fault, I would not be able to measure the voltage.