zigbee2mqtt icon indicating copy to clipboard operation
zigbee2mqtt copied to clipboard

Z2MQTT Add-on stops when losing connection to coordinator - error Failed to stop Zigbee2MQTT

Open mihsu81 opened this issue 2 years ago • 26 comments

What happened?

The Z2MQTT Add-on stops when losing connection to the coordinator. The HA server and LilyZig coordinator are connected to the same router. After I reboot the router (takes about 30-40 seconds), within ~20 seconds the Z2MQTT Add-on stops with the below error:

Zigbee2MQTT:error 2023-01-13 11:12:21: Adapter disconnected, stopping
Zigbee2MQTT:debug 2023-01-13 11:12:21: Saving state to file /config/zigbee2mqtt/state.json
Zigbee2MQTT:info  2023-01-13 11:12:21: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'offline'
Zigbee2MQTT:info  2023-01-13 11:12:21: Disconnecting from MQTT server
Zigbee2MQTT:info  2023-01-13 11:12:21: Stopping zigbee-herdsman...
Zigbee2MQTT:error 2023-01-13 11:12:21: Failed to stop Zigbee2MQTT

What did you expect to happen?

The Z2MQTT add-on retries connecting to the coordinator a configurable number of times.

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.29.1-1

Adapter firmware version

20220219

Adapter

ZigStar LilyZig POE

Debug log

log.txt

mihsu81 avatar Jan 13 '23 10:01 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Feb 13 '23 00:02 github-actions[bot]

The issue is still present.

mihsu81 avatar Feb 13 '23 10:02 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Mar 16 '23 00:03 github-actions[bot]

The issue is still present.

mihsu81 avatar Mar 16 '23 05:03 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Apr 16 '23 00:04 github-actions[bot]

The issue is still present.

mihsu81 avatar Apr 16 '23 06:04 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar May 18 '23 00:05 github-actions[bot]

The issue is still present.

mihsu81 avatar May 18 '23 06:05 mihsu81

I discovered similar issue.

After I disconnected SLZB-06 from ethernet cable, the zigbee2mqtt service crashed (understandably). Then I connected again my SLZB-06 gateway, but the zigbee2mqtt service stayed down. I waited at least 10 minutes, but the watchdog seems not working, seems it does not try to start the service.

midlan avatar May 26 '23 08:05 midlan

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jun 26 '23 00:06 github-actions[bot]

The issue is still present in the Z2MQTT 1.31.2-1 add-on and HA 2023.6.3.

mihsu81 avatar Jun 26 '23 04:06 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jul 28 '23 00:07 github-actions[bot]

The issue is still present in the Z2MQTT 1.32.1-1 add-on and HA 2023.7.3.

mihsu81 avatar Jul 28 '23 04:07 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Aug 29 '23 00:08 github-actions[bot]

The issue is still present in the Z2MQTT 1.32.2-1 add-on and HA 2023.8.4.

mihsu81 avatar Aug 29 '23 04:08 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Sep 29 '23 00:09 github-actions[bot]

The issue is still present.

mihsu81 avatar Sep 29 '23 04:09 mihsu81

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Oct 30 '23 00:10 github-actions[bot]

The issue is still present.

mihsu81 avatar Oct 30 '23 17:10 mihsu81

I have most likely the same or similar issue. I have an ethernet coordinator (TCP). When I, for example, restart my router, zigbee2mqtt crashes and doesn't recover even though the coordinator goes back online. The only way to get things back up and running is by manually restarting the zigbee2mqtt service (mine running barebones in VM).

I wonder if this is expected behavior or a bug?

Some logs:

error 2024-01-16 21:59:38: Adapter disconnected, stopping
debug 2024-01-16 21:59:38: Saving state to file /var/lib/zigbee2mqtt/state.json
info  2024-01-16 21:59:38: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
info  2024-01-16 21:59:38: Disconnecting from MQTT server
info  2024-01-16 21:59:38: Stopping zigbee-herdsman...
error 2024-01-16 21:59:38: Failed to stop Zigbee2MQTT

lhorak avatar Jan 18 '24 12:01 lhorak

I set up an automation which starts the Add-On if it's not running for 1 minute and 10 seconds. Nonetheless, I hope this bug will be fixed eventually.

alias: Start Zigbee2MQTT Add-On if stopped
description: ""
trigger:
  - type: not_running
    platform: device
    device_id: 2cd20c96528ec97880b06007e39d7c06
    entity_id: binary_sensor.zigbee2mqtt_running
    domain: binary_sensor
    for:
      hours: 0
      minutes: 1
      seconds: 10
condition: []
action:
  - service: hassio.addon_start
    data:
      addon: 45df7312_zigbee2mqtt
mode: single

mihsu81 avatar Jan 18 '24 18:01 mihsu81

Got it woking on Proxmox Alpine LXC.

Create the directory:

mkdir /opt/zigbee2mqtt

Create a script to check if the Zigbee2MQTT service crashed then execute a reboot.

cat <<EOF >/opt/zigbee2mqtt/check_zigbee_service.sh
if rc-service zigbee2mqtt status | grep -q "started"; then
    echo "zigbee2mqtt is running."
else
    echo "zigbee2mqtt is not running. Restarting..."
    #rc-service zigbee2mqtt restart
	reboot
fi
EOF

Give it execute permissions:

chmod +x /opt/zigbee2mqtt/check_zigbee_service.sh

Open your crontab file for editing:

nano /etc/crontabs/root

Add a line to schedule the script to run at your desired interval. For example, to run every 5 minutes:

*/5		*	*	*	*	/opt/zigbee2mqtt/check_zigbee_service.sh

reboot

3vilson avatar Jan 21 '24 14:01 3vilson

Just to give an update here, I was running z2m on Proxmox Alpine LXC, I checked the build version of the Alpine package: https://pkgs.alpinelinux.org/package/edge/community/aarch64/zigbee2mqtt and found out the latest version in the repository is 1.34.0, while z2m is on 1.35.1 currently.

I prefer to have things updated, so I've migrated to LXC with z2m running under docker, and I just tested and when unplugging the IP coordinator and plugging back in, z2m successfully starts up on it's own, so this has solved it for me (and as an added bonus I get instant updates instead of waiting for the Alpine package to get updated 🙂 )

I know this does not solve the issue and brings a little overhead with running Docker instead of barebones, but I just wanted to add this here as another option that is proven to work.

lhorak avatar Jan 23 '24 13:01 lhorak

I have the same issue. After unplugging my ethernet coordinator, the z2m addon crashes and does not come back up after reconnecting. This is a bug with z2m, because the watchdog in HA supervisor is not designed to handle such cases, the addon itself must handle it. See https://github.com/home-assistant/supervisor/pull/3779 when this behavior was updated. z2m should be updated to handle connection retrying (indefinitely) instead of crashing. If this behavior is problematic for another use case, then it can be a configurable setting.

The issue is still present with HA 2024.2.5, supervisor 2024.02.1, HAOS 12.0 and Zigbee2MQTT 1.36.0-1.

@3vilson @lhorak in my opinion what you wrote is not really relevant to this issue. The issue does not pertain to running on Proxmox and therefore this cannot be a solution. You have a different setup. It's like suggesting using ZHA instead of Z2M is a solution.

mrbrdo avatar Mar 08 '24 00:03 mrbrdo

Sill an issue on HA 2024.4.3, supervisor 2024.04.0, HAOS 12.2 Zigbee2MQTT 1.36.1-1

psarossy avatar Apr 19 '24 02:04 psarossy

How come it completely stops/crashes when it can't connect to the coordinator (which is very likely to occur specially when using network-based coordinators), instead of just checking every few minutes for coordinator availability?

r01k avatar Apr 27 '24 06:04 r01k

I got exactly issue with the latest version z2m 1.38.0. When I unplug the ethernet coordinator, the z2m got crashed and unable to work again even the ethernet coordinator is plugged again with the same IP address. It seems to be hang forever until I restart the z2m manually again.

Here is the latest log from z2m:

[2024-06-02 17:13:21] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-02 17:13:21] error: zh:ember:uart:ash: Failed to init port with error Error: connect ECONNREFUSED 192.168.86.27:8888 [2024-06-02 17:13:21] error: zh:ember: Failed to reset and init NCP. Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. [2024-06-02 17:13:21] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-06-02 17:13:21] info: zh:ember:uart:ash: Total frames: RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: DATA frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: ACK frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: ACK timeouts : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-02 17:13:22] info: zh:ember:ezsp: ======== EZSP stopped ======== [2024-06-02 17:13:22] info: zh:ember: ======== Ember Adapter Stopped ======== [2024-06-02 17:13:22] error: z2m: Adapter disconnected, stopping [2024-06-02 17:13:22] info: z2m: Disconnecting from MQTT server [2024-06-02 17:13:22] info: z2m: Stopping zigbee-herdsman... [2024-06-02 17:46:53] info: z2m: Disconnecting from MQTT server [2024-06-02 17:46:53] info: z2m: Stopping zigbee-herdsman...

dinhchinh82 avatar Jun 02 '24 12:06 dinhchinh82

Happened to me too, when i unplugged the coordinator a few seconds its still working properly, but when its unplugged for a longer time, the Z2M just not automatically running, i had to start it manually

Bjk8kds avatar Sep 07 '24 06:09 Bjk8kds