zigbee2mqtt
zigbee2mqtt copied to clipboard
Zigbee2MQTT service stops working - FATAL ERROR
What happened?
After last update I've noticed Z2M is stopping service with a fatal error, out of the blue.
I'm using HASS.IO all updated, in a i5 machine with 8 Gb do memory and 128 GB SSD.
What did you expect to happen?
No response
How to reproduce it (minimal and precise)
Just leave it running. After one or two days it stops working (service drops).
Zigbee2MQTT version
1.28.2 commit: unknown
Adapter firmware version
20220219
Adapter
SONOFF USB Dongle
Debug log
I did not have the debug log active when this happened. Posting normal log. ... Zigbee2MQTT:info 2022-11-07 11:45:15: MQTT publish: topic 'zigbee2mqtt/Smart Plug 15', payload '{"child_lock":"UNLOCK","current":0.04,"energy":6.45,"indicator_mode":"off/on","last_seen":"2022-11-07T11:45:13-03:00","linkquality":102,"power":0,"power_outage_memory":"restore","state":"ON","update":{"state":"idle"},"update_available":false}' <--- Last few GCs ---> [7:0x7fa21c7993c0] 61013533 ms: Mark-sweep 2044.2 (2085.3) -> 2042.2 (2085.3) MB, 2087.8 / 0.0 ms (average mu = 0.133, current mu = 0.010) allocation failure scavenge might not succeed [7:0x7fa21c7993c0] 61015639 ms: Mark-sweep 2044.3 (2085.3) -> 2042.2 (2085.3) MB, 2082.8 / 0.0 ms (average mu = 0.074, current mu = 0.011) allocation failure scavenge might not succeed <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
I reinstalled Z2M all from zero and today it happened again:
2022-11-09T17:11:10.996Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - incomingMsg - {"groupid":0,"clusterid":61184,"srcaddr":55137,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":61,"securityuse":0,"timestamp":11657102,"transseqnumber":0,"len":10,"data":{"type":"Buffer","data":[9,72,2,0,193,1,4,0,1,0]}} 2022-11-09T17:11:10.996Z zigbee-herdsman:controller:log Received 'zcl' data '{"frame":{"Header":{"frameControl":{"frameType":1,"manufacturerSpecific":false,"direction":1,"disableDefaultResponse":false,"reservedBits":0},"transactionSequenceNumber":72,"manufacturerCode":null,"commandIdentifier":2},"Payload":{"seq":49408,"dpValues":[{"dp":1,"datatype":4,"data":{"type":"Buffer","data":[0]}}]},"Command":{"ID":2,"parameters":[{"name":"seq","type":33},{"name":"dpValues","type":1011}],"name":"dataReport"}},"address":55137,"endpoint":1,"linkquality":61,"groupID":0,"wasBroadcast":false,"destinationEndpoint":1}' Zigbee2MQTT:debug 2022-11-09 14:11:10: Received Zigbee message from 'Presença 2', type 'commandDataReport', cluster 'manuSpecificTuya', data '{"dpValues":[{"data":{"data":[0],"type":"Buffer"},"datatype":4,"dp":1}],"seq":49408}' from endpoint 1 with groupID 0 <--- Last few GCs ---> [7:0x7f45b57b73c0] 84373002 ms: Scavenge (reduce) 2042.8 (2083.1) -> 2041.9 (2083.1) MB, 2.5 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure [7:0x7f45b57b73c0] 84373008 ms: Scavenge (reduce) 2043.0 (2083.1) -> 2042.1 (2083.1) MB, 2.7 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure [7:0x7f45b57b73c0] 84373015 ms: Scavenge (reduce) 2043.0 (2083.1) -> 2042.1 (2083.3) MB, 2.6 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
@Koenkk ... can you, please, help?
Thank you. Paulo.
What node version are you using?
What node version are you using?
Thanks for helping me. Not sure what you mean by "node". Sorry for my ignorance. Are you talking about NodeRed ? I'm using version 13.5.2.
Follows attached a diagnostic file with some info about my system. config_entry-mqtt-5f9bbf81bfa29cb6923246ba774884c4.json.txt
Please, let me know.
Thanks, Paulo.
I mean node.js (try running node --version
on your system)
I mean node.js (try running
node --version
on your system)
Hi @Koenkk ,
I tried and it does not work:
Maybe because I'm using Hass.io and NodeRed?
Thanks, Paulo.
how are you running z2m, via the addon?
how are you running z2m, via the addon?
Dear @Koenkk ,
Yes, I've installed it via Add-On.
See if this helps...
Thanks, Paulo.
If I turn off the watchdog it stops after a few hours. Just happened again... We are in a group of more than 2.000 HA users and more people are getting this too. Looks like Hass.io users...
@Koenkk .... Look!
With help from some friends, i think I got the node.js version:
Will wait for your instructions.
Thanks, Paulo.
Can you provide the herdsman debug logging, from starting z2m till the error?
Hi @Koenkk ,
Of course.
Is that the one I get from Settings > System > Logs > Z2M, copy and save?
Is there a way to access former logs? I'm waiting for it to crash again to get the log.
Thanks, Paulo.
I have not seen this screen before, how big is the log? (does it include everything from start)
I have not seen this screen before, how big is the log? (does it include everything from start)
Dear @Koenkk , This is the screen I mentioned before. I'm not sure if it includes all of it. I just grabbed all of its contents so you can see: log.txt Please let me know if there are any other ways to get the logs. I could get the logs from the former crashes.
I'm still waiting so it crashes again but since I'm not home these days maybe it "cooled down" a little.
Thanks, Paulo.
These are not the herdsman debug logging. (they contain zigbee-herdsman
like in https://github.com/Koenkk/zigbee2mqtt/issues/14853#issuecomment-1312524794).
See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.
These are not the herdsman debug logging. (they contain
zigbee-herdsman
like in #14853 (comment)).See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.
Dear @Koenkk ,
I GOT ANOTHER CRASH! I copied the entire log: log2.txt
Please, let me know.
Thanks, Paulo.
can you turn off Espelho 2 4gang
and see if the error still occurs?
can you turn off
Espelho 2 4gang
and see if the error still occurs?
I will remove it and restart Z2M, ok?
Thanks, Paulo.
Ok 👍
I seem to have the same issue here (I opened an issue at https://github.com/Koenkk/zigbee2mqtt/issues/14992 before finding this one). I'm also using node version v16.17.1, so perhaps it has something to do with that indeed? I'm still trying to find the logs.
By the way, for me it happens much more often, sometimes 4 to 5 times on one evening.
Ok 👍
I got another crash today, so I guess it is not "Espelho 2 4gang" fault. Now I reseted it again and I'm testing one more.
Mine crashed as well. The Z2M logs gave me the following error:
Error: SRSP - AF - dataRequestExt after 6000ms
at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)
These are not the herdsman debug logging. (they contain
zigbee-herdsman
like in #14853 (comment)).See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.
I am not able to retrieve the herdsman logs using this method, because the docker container just seems to crash in my case. So when I run docker ps
, there isn't a z2m one running after the crash.
Dear @Koenkk ,
After I removed the Espelho 2 4gang, it is still happening. Follows the last log extracted: log4.txt
Please, let me know.
Thanks, Paulo.
@pauloon does the docker container also completely stop in your case?
@pauloon does the docker container also completely stop in your case?
Hi @XanderTenBoden I think the docker is crashing too. One time that I ran docker ps and didn't see it. I'm going to let it happen again, and check to make sure.
Thanks, Paulo.
@pauloon What I did is turning off the watchdog and then wait for it to stop working. As soon as that did, all my zigbee devices in lovelace turned to "unavailable", and the zigbee2mqtt add-on wasn't running anymore.
It seems like the issue has something to do with the z2m docker eating too much RAM. I've enabled the memory sensors and this is what I'm seeing in the history:
I do have a lot of devices connected, about 90. How many do you have? It might explain why the crashes are more frequently for me if you have less than that connected? 🤔
@pauloon What I did is turning off the watchdog and then wait for it to stop working. As soon as that did, all my zigbee devices in lovelace turned to "unavailable", and the zigbee2mqtt add-on wasn't running anymore.
It seems like the issue has something to do with the z2m docker eating too much RAM. I've enabled the memory sensors and this is what I'm seeing in the history:
https://tenboden.com/ram.png
I do have a lot of devices connected, about 90. How many do you have? It might explain why the crashes are more frequently for me if you have less than that connected? 🤔
@XanderTenBoden , I have 51 devices. I'm also keeping watchdog disabled, otherwise it restarts Z2M. I'll try to check the memory history, same as you did.
Thanks, Paulo.
@pauloon I also see Presença 4
and Presença 2
are spamming a lot, can you also try to remove those?
@pauloon I also see
Presença 4
andPresença 2
are spamming a lot, can you also try to remove those?
Hi @Koenkk ,
I will try turn them off, to stop the traffic. These are mmWave presence sensors, which I have four. I know they push a lot of data from one expose called "target_distance" that pushes every second, but ONLY when there is someone being detected. As far as I understood it is not possible to turn off this reporting. I added this expose in the "filters" so it reduces MQTT data (I don't use it). I'm guessing these are not guilty, because the crashes that happened a few days ago I was traveling, and they were reporting much less data.
I'll keep you posted.
Thanks, Paulo.
@Koenkk ,
Just got another crash. Very fast, this time.
Follows the log: log5.txt
I just had restarted HA and Z2M after turning off the sensors.
@XanderTenBoden ,
Indeed, the contained crashed too.
But I did not get the same memory spique that you got.
Please, let me know, Thanks, Paulo.