diyBMSv4ESP32 icon indicating copy to clipboard operation
diyBMSv4ESP32 copied to clipboard

Low battery voltage if wifi disconnected.

Open Linusten opened this issue 1 year ago • 22 comments

Describe the bug If the wifi connection is lost the controller reports to my victron that the battery voltage is low.

image

Hardware/Software Versions Controller version (from PCB): 4.5 Host name: DIYBMS-009DAA84 Processor: ESP32 Version: 5b7135f8127c6fd9d5d18525f7d5de72a32b4232 Compiled: 2023-04-17T08:23:56.977Z Language: en SDK Version: v4.4.4 Min free Heap: 59820 Free heap: 109184 Heap size: 293796

To Reproduce Steps to reproduce the behavior: Break the wifi connection while the controller is running.

Linusten avatar Jun 19 '23 15:06 Linusten

Hello, this is a strange one!

I can't recreate this issue on my test rig. As you would expect, the WIFI code has nothing to do with the communication or alarm monitoring over the CANBUS to Victron.

How are you powering the controller? Is this directly from the battery?

How do you test for lost wifi - do you simply switch the router off?

Can you capture the text/log output of the USB serial port on the ESP32 ?

stuartpittaway avatar Jun 23 '23 12:06 stuartpittaway

Does MQTT still run over Wifi to IOBroker or something? Which Cerbo and Multiplus firmware is on it? What is set in the Cerbo, which voltage sensor is used? Have you set everything to visible in the Cerbo under Settings - System setup - Battery measurement? Then you could check in the VRM which sensor is triggering the alarm. In the VRM you can also see the origin of the error (e.g. VE.Bus System [276]). 276 would be the MP's 512 comes from the diyBMS and if you have a Victron shunt, then the 279.

JochenSchmidt avatar Jun 30 '23 14:06 JochenSchmidt

Thanks @JochenSchmidt for the nice hints :) In VRM i can see that the Error is beeing raised by [276]

VE.Bus System [276] | Automatic monitoring | Low battery: Alarm

Which is very strange because the battery was never below 60%...

Linusten avatar Jun 30 '23 14:06 Linusten

@Linusten Ok, then the cause of the alarm comes from the VE.Bus system (MP). One of the MPs thinks the Voltage is gone for a moment. I've had this before - very rarely and I have no idea why this happens - but not because I disconnect the WiFi connection between the diyBMS and the repeater. Simply that way. Until a few days ago I had version V501 (3x Quattro II) on the Quattros and version 3.00~18 (beta version) on the Cerbo. Now V505 on the Quattros and 3.00 on the Cerbo. I'll test it with WiFi off. How did you do that? Simply pull the plug on the router/repeater, or how?

JochenSchmidt avatar Jun 30 '23 15:06 JochenSchmidt

@Linusten Please have a look at Issue #225 I made some tests - after switching WLAN off nothing were bad. After WLAN switching on, the system switches completly off and the diyBMS had a system error. I had to start it manually by pressing the left button.

JochenSchmidt avatar Jul 04 '23 14:07 JochenSchmidt

I am Testing the newest commit, will update you if the error occurs.

-> https://github.com/stuartpittaway/diyBMSv4ESP32/actions/runs/5476905297

Linusten avatar Jul 07 '23 06:07 Linusten

@Linusten Ok, then the cause of the alarm comes from the VE.Bus system (MP). One of the MPs thinks the Voltage is gone for a moment. I've had this before - very rarely and I have no idea why this happens - but not because I disconnect the WiFi connection between the diyBMS and the repeater. Simply that way. Until a few days ago I had version V501 (3x Quattro II) on the Quattros and version 3.00~18 (beta version) on the Cerbo. Now V505 on the Quattros and 3.00 on the Cerbo. I'll test it with WiFi off. How did you do that? Simply pull the plug on the router/repeater, or how?

+1 on @JochenSchmidt comment!

system: multiplus II (3000/24, firmware v5.02) connected via MK2 to rpi 3B+ running Venus large pre 3.00 betas and diyBMS on canbus to rpi. Also a Victron MPPT and 600W solar connected to the rpi. things were almost ok, I would get the occasional low battery having to reboot the lot.

A few months ago, upgraded diyBMS to latest and rpi to VenusOS 3.00 release Large System would crash V.often leaving me with a system that needed a full shutdown/reboot and a second on the rpi (all were powered together via one dropper) - sorry didn't occur to me to do a ESP32 reboot via the left button, so haven't tried that.

Couple of days ago, upgraded multiplus II from 5.02 to 5.05 firmware. No locking of diyBMS or anything getting offline and messing settings with no BMS found since! I only managed once to induce a low battery (again from the multiplus) but didn't affect the rest and only lasted for 3-4secs before re-establishing connection and keep on working. At the moment it was drawing 17A from the generator, charging the 20%SoC bank and at the same time providing 2kw to the watermaker. Unplugging the boat router and leaving it offline for 5-10mins didn't affect the system either, all kept on working fine, no complains from the multiplus.

So if you face such issues, I'd highly recommend getting the multiplus (and any other Victron devices) firmware updated PDQ! (I vaguely remember some months back someone on victron site mentioning that you MUST upgrade multi firmware if you go to full release 3.0 venusOS, and I guess they were right...)

cheers

V.

PS. not using MQTT so not experiencing anything like #225

virtuvas avatar Jul 18 '23 07:07 virtuvas

Hello all, today was the 2nd time I experienced something similar.

The internet connection failed today while I was in a Teams call, (routing issue at the provider this time) 6 minutes later we lost power to the house (router/modem/wifi on small UPS) and I saw on the Victron GX these errors:

  • Internal Failure
  • Low Battery Voltage

Power restored and failed multiple times when I was observing my setup, until internet was restored and stopped the cycling. I did not change a thing on my setup..

A few weeks ago they cut a Coax cable in our town, and in hindsight the same happened. The I reset the controller to solve the cycling. I now also think the CAN-Bus alarm I had previously is linked to loss of Wifi and not to bug in the CAN-bus code.

Since I run the latest Victron firmware the GX now also shows the internal failure notification. The battery isn't low, the GX just loses communication. I had also enabled a rule to power a relay while having the Internal BMS Error so I could react before the GX error : the display shows red with "Modules or RS-485 error"

I have 14 modules, some rules defined an SD-card (60s logging) and MQTT enabled. Maybe MQTT is the culprit, I didn't have time yet to study the code.

Cheers, Bert

Screenshot 2023-09-07 111606

bertvaneyken avatar Sep 07 '23 10:09 bertvaneyken

Hi @bertvaneyken thanks for taking the time to report the issue.

We've seen this problem on a few installations now, some of it appears to be bugs in the Victron software, but I also agree that DIYBMS MQTT interface appears to add to the problem.

The DIYBMS reported to Victron the "internal failure" - this typically only happens when the modules stop responding to the controller. During a power cut, or when the power is going on/off/on/off very quickly, I've seen the symptoms of "power spikes" affecting the DC battery and the modules.

Perhaps this could also be seen in your system?

stuartpittaway avatar Sep 07 '23 12:09 stuartpittaway

Hi Stuart, I'm still struggling with this.

Looking for a bright idea here after enjoying myself with debugging...

  • installed a new accesspoint closer by, RSSI is now -54 dBm
  • disabled MQTT entirely, it makes no statistical difference
  • upgraded to 2023-12-27 which had the effect of no Internal Failures anymore but now it reboots more than hourly instead of daily.
  • unplugged the grid (completely offgrid), it makes no statistical difference
  • I can't correlate power usage peaks with reboots
  • I can't read voltage spikes with a multimeter when connected directly to the inverter input
  • tried 2 times to catch serial debug, never got lucky. I cannot debug more then 3hrs as I can't use mains to charge my laptop. (I use the INA229 add-on board) . It looks like a cold reboot will keep it stable for longer.
  • replaced the 5V PSU with a new one (I use 2 PSU's one from 48v to 12v and one from the 12v rail to 5v)
  • the 12v environmental system (fan - small heating) is unplugged
  • the running LED on the cell boards looks like it only stops after the controller reboots

To rule out a hardware issue I ordered a new ESP32 (with external antenna) and some missing chips to build a second controller on a spare v4.2 board.

As far as I can see in the code it makes no sense the controller reboots after reporting Low battery over CAN. Maybe the Victron concludes this after losing CAN connection?

I just completely disabled the Current & Voltage monitoring as a last attempt.

I hope it is something obvious when I replace the controller :)

2024-01-08 20_52_33-VRM Portal - Victron Energy - Mobiele Zonnepanelen - VRM Portal

2024-01-08 21_45_31-VRM Portal - Victron Energy - Mobiele Zonnepanelen - VRM Portal

bertvaneyken avatar Jan 08 '24 20:01 bertvaneyken

To rule out a hardware issue I ordered a new ESP32 (with external antenna) and some missing chips to build a second controller on a spare v4.2 board.

As far as I can see in the code it makes no sense the controller reboots after reporting Low battery over CAN.

The DIYBMS controller should NEVER reboot unintentionally.

Can I ask you to provide the initial serial debug output when the ESP32 is power up? I'm wondering if the ESP32 is a particular hardware revision which is causing problems. If you have ordered another one it would be a good test.

DIYBMS is reliable - this is a screenshot from my home system, with uptime over 65 days (since I manually rebooted it) and during that time, I've had zero communication issues and over 30 million CANBUS messages.

image

stuartpittaway avatar Jan 09 '24 10:01 stuartpittaway

No doubt it should be stable and reliable :-)

I have never seen CAN errors either. 2024_01_09_12_37_49

I'm building new modules as well with parts I have laying around (4.40) so I can swap out everything. A hardware issue is the most probable cause IMO.

I did notice the modules do throw some errors (I use the standard baudrate): 2024-01-09 12_23_14-DIY BMS CONTROLLER v4

Logs while running are here, I'll post boot logs tonight. (MQTT was under maintenance in the first part) diybms20240105.log

bertvaneyken avatar Jan 09 '24 11:01 bertvaneyken

Two observations from the logs...

You are getting SD card errors. Might be worth removing it and re-formatting it on a PC.

[127003][E][vfs_api.cpp:332] VFSFileImpl(): fopen(/sd/data_20240105.csv) failed
I (133555) diybms: Cell monitor log file
I (133647) diybms: Task 2
[127232][E][vfs_api.cpp:332] VFSFileImpl(): fopen(/sd/modbus90_20240105.csv) failed

The available memory is dropping over time - this might be related to the SD card problem, this would ultimately force the controller to reboot if the memory gets too low.

D (102980) diybms: total_free_byte=98428 total_allocated_byte=192376 largest_free_blk=59380 min_free_byte=86152 alloc_blk=545 free_blk=12 total_blk=557
I (7855562) diybms: Time now: Fri Jan  5 21:56:24 2024
D (7855562) diybms: total_free_byte=95572 total_allocated_byte=194640 largest_free_blk=49140 min_free_byte=68884 alloc_blk=582 free_blk=24 total_blk=606

stuartpittaway avatar Jan 09 '24 11:01 stuartpittaway

I've just removed the SD-card from the controller and I'll leave it running without now.

Serial output logs of the initialization are here: putty.log

(the mqtt password was missing while I ran the dump, so that is why it now fails)

bertvaneyken avatar Jan 15 '24 21:01 bertvaneyken

You were right, it seems that the worn out SD-card is the culprit of the crahes.

bertvaneyken avatar Jan 16 '24 20:01 bertvaneyken

Wow, I always had my suspicion but never any proof the SD card would cause the problem.

stuartpittaway avatar Jan 17 '24 10:01 stuartpittaway

I'm not 100% sure the SD-card only is at fault but it is way more stable without it.

I finally could catch a spontanous reboot via de serial output. It looks like a null pointer exception?

I (356427567) diybms-mqtt: MQTT counters: Err_Con=0,Err_Trans=1,Conn=1,Disc=1 I (356427568) diybms: Time now: Thu Feb 1 20:52:39 2024 D (356427568) diybms: total_free_byte=119900 total_allocated_byte=170408 largest_free_blk=77812 min_free_byte=94132 alloc_blk=576 free_blk=16 total_blk=592 D (356427668) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356427687) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356428681) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356428681) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 E (356428909) esp-tls: [sock=51] select() timeout E (356428911) TRANSPORT_BASE: Failed to open a new connection: 32774 E (356428911) MQTT_CLIENT: Error transport connect E (356428914) diybms-mqtt: ERROR_TYPE_TCP (Success) I (356428918) diybms-mqtt: MQTT_EVENT_DISCONNECTED D (356429671) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356429671) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 I (356429688) diybms: WIFI_EVENT_STA_DISCONNECTED I (356429790) diybms-mqtt: Stopping MQTT client W (356429961) diybms-mqtt: MQTT enabled, but not connected W (356429961) diybms-mqtt: MQTT enabled, but not connected W (356429962) diybms-mqtt: MQTT enabled, but not connected D (356430662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356430663) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 I (356431354) diybms: Task 2 D (356431668) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356431669) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356432671) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356432671) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356433670) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356433672) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356434663) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356434664) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 W (356434966) diybms-mqtt: MQTT enabled, but not connected D (356435662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356435662) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356436672) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356436673) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356436855) diybms: Task 3, s=0 e=13 D (356437664) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356437665) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356438662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356438662) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356439663) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356439663) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 W (356439971) diybms-mqtt: MQTT enabled, but not connected D (356440662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356440662) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356441670) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356441670) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356442665) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356442682) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 D (356443664) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0 D (356443664) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0 I (356443935) diybms: WIFI connect quick retry 1 Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump: PC : 0x401b579e PS : 0x00060c30 A0 : 0x801b5883 A1 : 0x3ffd8590
A2 : 0x3ffb6328 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff
A6 : 0x00000000 A7 : 0x3ffe2d9c A8 : 0x3ffda640 A9 : 0x3ffd8500
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x3ffe3b48 A13 : 0x3ffe3b48
A14 : 0x3ffe2d6c A15 : 0x3ffe2da6 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x4008c0e1 LEND : 0x4008c0f1 LCOUNT : 0xfffffffe

Backtrace: 0x401b579b:0x3ffd8590 0x401b5880:0x3ffd85e0

ELF file SHA256: 31418cd666101b8d

Rebooting... ets Jun 8 2016 00:22:57

full log: putty20240203.zip

bertvaneyken avatar Feb 03 '24 13:02 bertvaneyken

Ok, I've had another user report similar problems.

stuartpittaway avatar Feb 03 '24 14:02 stuartpittaway

I finally could catch a spontanous reboot via de serial output. It looks like a null pointer exception?

Definitely a null dereference, I'm suspecting something with the MQTT client or the http server is causing the crash within the event handler but it is not clear which is at fault.

atanisoft avatar Feb 03 '24 14:02 atanisoft

It looks like a null pointer exception?

Thanks for this. You appear to have MQTT enabled, but its not connected to the MQTT server (MQTT enabled, but not connected) does the crash still occur with MQTT disabled?

stuartpittaway avatar Feb 03 '24 14:02 stuartpittaway

@bertvaneyken I've started another debug log from my environment - https://github.com/stuartpittaway/diyBMSv4ESP32/issues/276

Could you try and re-produce the same test?

I managed to get a core panic - @atanisoft does this still look like a null dereference issue?

I (1759137) diybms: WIFI connect quick retry 1
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x401b5f3e  PS      : 0x00060a30  A0      : 0x801b6023  A1      : 0x3ffd8f00
A2      : 0x3ffb62d4  A3      : 0xffffffff  A4      : 0x00000000  A5      : 0xffffffff  
A6      : 0x00000000  A7      : 0x3ffe3458  A8      : 0x3ffdae70  A9      : 0x3ffd8e70
A10     : 0x00000000  A11     : 0x00000001  A12     : 0x3ffe2928  A13     : 0x3ffe2928  
A14     : 0x3ffe3428  A15     : 0x3ffe3462  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x4008c0e1  LEND    : 0x4008c0f1  LCOUNT  : 0xfffffffe  


Backtrace: 0x401b5f3b:0x3ffd8f00 0x401b6020:0x3ffd8f50

  #0  0x401b5f3b:0x3ffd8f00 in handler_execute at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:139
      (inlined by) esp_event_loop_run at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:590
  #1  0x401b6020:0x3ffd8f50 in esp_event_loop_run_task at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:115 (discriminator 15)  

stuartpittaway avatar Feb 06 '24 12:02 stuartpittaway

Hi Stuart, I disabled MQTT and it didn't reboot since, however it could run for days or weeks in the past, so no real proof there. I also checked my other MQTT sending devices and they appear to have kept sending data during the time the BMS restarted.

Tonight I re-enabled MQTT and did the following tests:

  • stopped the MQTT service on my Azure server for 15 minutes
  • unplugged the access point servicing the BMS for 45 minutes
  • disabled the WiFi radio on the AP for 15 minutes

None of this provoked an issue... so i'm not sure what the direct cause would be.

bertvaneyken avatar Feb 06 '24 22:02 bertvaneyken