zigbee2mqtt Zigbee2MQTT service stops working

What happened?

After last update I've noticed Z2M is stopping service with a fatal error, out of the blue.

I'm using HASS.IO all updated, in a i5 machine with 8 Gb do memory and 128 GB SSD.

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

Just leave it running. After one or two days it stops working (service drops).

Zigbee2MQTT version

1.28.2 commit: unknown

Adapter firmware version

20220219

Adapter

SONOFF USB Dongle

Debug log

I did not have the debug log active when this happened. Posting normal log. ... Zigbee2MQTT:info 2022-11-07 11:45:15: MQTT publish: topic 'zigbee2mqtt/Smart Plug 15', payload '{"child_lock":"UNLOCK","current":0.04,"energy":6.45,"indicator_mode":"off/on","last_seen":"2022-11-07T11:45:13-03:00","linkquality":102,"power":0,"power_outage_memory":"restore","state":"ON","update":{"state":"idle"},"update_available":false}' <--- Last few GCs ---> [7:0x7fa21c7993c0] 61013533 ms: Mark-sweep 2044.2 (2085.3) -> 2042.2 (2085.3) MB, 2087.8 / 0.0 ms (average mu = 0.133, current mu = 0.010) allocation failure scavenge might not succeed [7:0x7fa21c7993c0] 61015639 ms: Mark-sweep 2044.3 (2085.3) -> 2042.2 (2085.3) MB, 2082.8 / 0.0 ms (average mu = 0.074, current mu = 0.011) allocation failure scavenge might not succeed <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Nov 07 '22 15:11 pauloon

Hi,

Just had the same problem again. Attached the log file: log.txt

Please, help. Thanks, Paulo.

Nov 08 '22 12:11 pauloon

I reinstalled Z2M all from zero and today it happened again:

2022-11-09T17:11:10.996Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - incomingMsg - {"groupid":0,"clusterid":61184,"srcaddr":55137,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":61,"securityuse":0,"timestamp":11657102,"transseqnumber":0,"len":10,"data":{"type":"Buffer","data":[9,72,2,0,193,1,4,0,1,0]}} 2022-11-09T17:11:10.996Z zigbee-herdsman:controller:log Received 'zcl' data '{"frame":{"Header":{"frameControl":{"frameType":1,"manufacturerSpecific":false,"direction":1,"disableDefaultResponse":false,"reservedBits":0},"transactionSequenceNumber":72,"manufacturerCode":null,"commandIdentifier":2},"Payload":{"seq":49408,"dpValues":[{"dp":1,"datatype":4,"data":{"type":"Buffer","data":[0]}}]},"Command":{"ID":2,"parameters":[{"name":"seq","type":33},{"name":"dpValues","type":1011}],"name":"dataReport"}},"address":55137,"endpoint":1,"linkquality":61,"groupID":0,"wasBroadcast":false,"destinationEndpoint":1}' Zigbee2MQTT:debug 2022-11-09 14:11:10: Received Zigbee message from 'Presença 2', type 'commandDataReport', cluster 'manuSpecificTuya', data '{"dpValues":[{"data":{"data":[0],"type":"Buffer"},"datatype":4,"dp":1}],"seq":49408}' from endpoint 1 with groupID 0 <--- Last few GCs ---> [7:0x7f45b57b73c0] 84373002 ms: Scavenge (reduce) 2042.8 (2083.1) -> 2041.9 (2083.1) MB, 2.5 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure [7:0x7f45b57b73c0] 84373008 ms: Scavenge (reduce) 2043.0 (2083.1) -> 2042.1 (2083.1) MB, 2.7 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure [7:0x7f45b57b73c0] 84373015 ms: Scavenge (reduce) 2043.0 (2083.1) -> 2042.1 (2083.3) MB, 2.6 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

@Koenkk ... can you, please, help?

Thank you. Paulo.

Nov 09 '22 17:11 pauloon

What node version are you using?

Nov 09 '22 18:11 Koenkk

What node version are you using?

Thanks for helping me. Not sure what you mean by "node". Sorry for my ignorance. Are you talking about NodeRed ? I'm using version 13.5.2.

Follows attached a diagnostic file with some info about my system. config_entry-mqtt-5f9bbf81bfa29cb6923246ba774884c4.json.txt

Please, let me know.

Thanks, Paulo.

Nov 09 '22 18:11 pauloon

I mean node.js (try running node --version on your system)

Nov 10 '22 16:11 Koenkk

I mean node.js (try running node --version on your system)

Hi @Koenkk , I tried and it does not work:

Maybe because I'm using Hass.io and NodeRed?

Thanks, Paulo.

Nov 11 '22 13:11 pauloon

how are you running z2m, via the addon?

Nov 11 '22 16:11 Koenkk

how are you running z2m, via the addon?

Dear @Koenkk , Yes, I've installed it via Add-On.

See if this helps...

Thanks, Paulo.

Nov 11 '22 16:11 pauloon

If I turn off the watchdog it stops after a few hours. Just happened again... We are in a group of more than 2.000 HA users and more people are getting this too. Looks like Hass.io users...

Nov 11 '22 17:11 pauloon

@Koenkk .... Look! With help from some friends, i think I got the node.js version:

Will wait for your instructions.

Thanks, Paulo.

Nov 11 '22 18:11 pauloon

Can you provide the herdsman debug logging, from starting z2m till the error?

Nov 12 '22 08:11 Koenkk

Hi @Koenkk ,

Of course. Is that the one I get from Settings > System > Logs > Z2M, copy and save?

Is there a way to access former logs? I'm waiting for it to crash again to get the log.

Thanks, Paulo.

Nov 12 '22 16:11 pauloon

I have not seen this screen before, how big is the log? (does it include everything from start)

Nov 13 '22 08:11 Koenkk

I have not seen this screen before, how big is the log? (does it include everything from start)

Dear @Koenkk , This is the screen I mentioned before. I'm not sure if it includes all of it. I just grabbed all of its contents so you can see: log.txt Please let me know if there are any other ways to get the logs. I could get the logs from the former crashes.

I'm still waiting so it crashes again but since I'm not home these days maybe it "cooled down" a little.

Thanks, Paulo.

Nov 13 '22 13:11 pauloon

These are not the herdsman debug logging. (they contain zigbee-herdsman like in https://github.com/Koenkk/zigbee2mqtt/issues/14853#issuecomment-1312524794).

See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.

Nov 13 '22 14:11 Koenkk

These are not the herdsman debug logging. (they contain zigbee-herdsman like in #14853 (comment)).

See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.

Dear @Koenkk ,

I GOT ANOTHER CRASH! I copied the entire log: log2.txt

Please, let me know.

Thanks, Paulo.

Nov 13 '22 18:11 pauloon

can you turn off Espelho 2 4gang and see if the error still occurs?

Nov 14 '22 19:11 Koenkk

can you turn off Espelho 2 4gang and see if the error still occurs?

I will remove it and restart Z2M, ok?

Thanks, Paulo.

Nov 14 '22 20:11 pauloon

Ok 👍

Nov 15 '22 18:11 Koenkk

I seem to have the same issue here (I opened an issue at https://github.com/Koenkk/zigbee2mqtt/issues/14992 before finding this one). I'm also using node version v16.17.1, so perhaps it has something to do with that indeed? I'm still trying to find the logs.

By the way, for me it happens much more often, sometimes 4 to 5 times on one evening.

Nov 15 '22 20:11 XanderTenBoden

Ok 👍

I got another crash today, so I guess it is not "Espelho 2 4gang" fault. Now I reseted it again and I'm testing one more.

Nov 15 '22 21:11 pauloon

Mine crashed as well. The Z2M logs gave me the following error:

Error: SRSP - AF - dataRequestExt after 6000ms
    at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)

These are not the herdsman debug logging. (they contain zigbee-herdsman like in #14853 (comment)).

See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.

I am not able to retrieve the herdsman logs using this method, because the docker container just seems to crash in my case. So when I run docker ps, there isn't a z2m one running after the crash.

Nov 16 '22 08:11 XanderTenBoden

Dear @Koenkk ,

After I removed the Espelho 2 4gang, it is still happening. Follows the last log extracted: log4.txt

Please, let me know.

Thanks, Paulo.

Nov 16 '22 21:11 pauloon

@pauloon does the docker container also completely stop in your case?

Nov 16 '22 21:11 XanderTenBoden

@pauloon does the docker container also completely stop in your case?

Hi @XanderTenBoden I think the docker is crashing too. One time that I ran docker ps and didn't see it. I'm going to let it happen again, and check to make sure.

Thanks, Paulo.

Nov 16 '22 23:11 pauloon

@pauloon What I did is turning off the watchdog and then wait for it to stop working. As soon as that did, all my zigbee devices in lovelace turned to "unavailable", and the zigbee2mqtt add-on wasn't running anymore.

It seems like the issue has something to do with the z2m docker eating too much RAM. I've enabled the memory sensors and this is what I'm seeing in the history:

https://tenboden.com/ram.png

I do have a lot of devices connected, about 90. How many do you have? It might explain why the crashes are more frequently for me if you have less than that connected? 🤔

Nov 17 '22 08:11 XanderTenBoden

@pauloon What I did is turning off the watchdog and then wait for it to stop working. As soon as that did, all my zigbee devices in lovelace turned to "unavailable", and the zigbee2mqtt add-on wasn't running anymore.

It seems like the issue has something to do with the z2m docker eating too much RAM. I've enabled the memory sensors and this is what I'm seeing in the history:

https://tenboden.com/ram.png

I do have a lot of devices connected, about 90. How many do you have? It might explain why the crashes are more frequently for me if you have less than that connected? 🤔

@XanderTenBoden , I have 51 devices. I'm also keeping watchdog disabled, otherwise it restarts Z2M. I'll try to check the memory history, same as you did.

Thanks, Paulo.

Nov 17 '22 11:11 pauloon

@pauloon I also see Presença 4 and Presença 2 are spamming a lot, can you also try to remove those?

Nov 17 '22 16:11 Koenkk

@pauloon I also see Presença 4 and Presença 2 are spamming a lot, can you also try to remove those?

Hi @Koenkk ,

I will try turn them off, to stop the traffic. These are mmWave presence sensors, which I have four. I know they push a lot of data from one expose called "target_distance" that pushes every second, but ONLY when there is someone being detected. As far as I understood it is not possible to turn off this reporting. I added this expose in the "filters" so it reduces MQTT data (I don't use it). I'm guessing these are not guilty, because the crashes that happened a few days ago I was traveling, and they were reporting much less data.

I'll keep you posted.

Thanks, Paulo.

Nov 17 '22 19:11 pauloon

@Koenkk ,

Just got another crash. Very fast, this time.

Follows the log: log5.txt

I just had restarted HA and Z2M after turning off the sensors.

@XanderTenBoden , Indeed, the contained crashed too. But I did not get the same memory spique that you got.

Please, let me know, Thanks, Paulo.

Nov 17 '22 21:11 pauloon

zigbee2mqtt
zigbee2mqtt copied to clipboard

Zigbee2MQTT service stops working - FATAL ERROR

What happened?

What did you expect to happen?

How to reproduce it (minimal and precise)

Zigbee2MQTT version

Adapter firmware version

Adapter

Debug log

zigbee2mqtt zigbee2mqtt copied to clipboard

Zigbee2MQTT service stops working - FATAL ERROR

What happened?

What did you expect to happen?

How to reproduce it (minimal and precise)

Zigbee2MQTT version

Adapter firmware version

Adapter

Debug log

zigbee2mqtt
zigbee2mqtt copied to clipboard