zigbee2mqtt
zigbee2mqtt copied to clipboard
Z2MQTT Add-on stops when losing connection to coordinator - error Failed to stop Zigbee2MQTT
What happened?
The Z2MQTT Add-on stops when losing connection to the coordinator. The HA server and LilyZig coordinator are connected to the same router. After I reboot the router (takes about 30-40 seconds), within ~20 seconds the Z2MQTT Add-on stops with the below error:
Zigbee2MQTT:error 2023-01-13 11:12:21: Adapter disconnected, stopping
Zigbee2MQTT:debug 2023-01-13 11:12:21: Saving state to file /config/zigbee2mqtt/state.json
Zigbee2MQTT:info 2023-01-13 11:12:21: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'offline'
Zigbee2MQTT:info 2023-01-13 11:12:21: Disconnecting from MQTT server
Zigbee2MQTT:info 2023-01-13 11:12:21: Stopping zigbee-herdsman...
Zigbee2MQTT:error 2023-01-13 11:12:21: Failed to stop Zigbee2MQTT
What did you expect to happen?
The Z2MQTT add-on retries connecting to the coordinator a configurable number of times.
How to reproduce it (minimal and precise)
No response
Zigbee2MQTT version
1.29.1-1
Adapter firmware version
20220219
Adapter
ZigStar LilyZig POE
Debug log
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present.
I discovered similar issue.
After I disconnected SLZB-06 from ethernet cable, the zigbee2mqtt service crashed (understandably). Then I connected again my SLZB-06 gateway, but the zigbee2mqtt service stayed down. I waited at least 10 minutes, but the watchdog seems not working, seems it does not try to start the service.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present in the Z2MQTT 1.31.2-1 add-on and HA 2023.6.3.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present in the Z2MQTT 1.32.1-1 add-on and HA 2023.7.3.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present in the Z2MQTT 1.32.2-1 add-on and HA 2023.8.4.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
The issue is still present.
I have most likely the same or similar issue. I have an ethernet coordinator (TCP). When I, for example, restart my router, zigbee2mqtt crashes and doesn't recover even though the coordinator goes back online. The only way to get things back up and running is by manually restarting the zigbee2mqtt service (mine running barebones in VM).
I wonder if this is expected behavior or a bug?
Some logs:
error 2024-01-16 21:59:38: Adapter disconnected, stopping
debug 2024-01-16 21:59:38: Saving state to file /var/lib/zigbee2mqtt/state.json
info 2024-01-16 21:59:38: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
info 2024-01-16 21:59:38: Disconnecting from MQTT server
info 2024-01-16 21:59:38: Stopping zigbee-herdsman...
error 2024-01-16 21:59:38: Failed to stop Zigbee2MQTT
I set up an automation which starts the Add-On if it's not running for 1 minute and 10 seconds. Nonetheless, I hope this bug will be fixed eventually.
alias: Start Zigbee2MQTT Add-On if stopped
description: ""
trigger:
- type: not_running
platform: device
device_id: 2cd20c96528ec97880b06007e39d7c06
entity_id: binary_sensor.zigbee2mqtt_running
domain: binary_sensor
for:
hours: 0
minutes: 1
seconds: 10
condition: []
action:
- service: hassio.addon_start
data:
addon: 45df7312_zigbee2mqtt
mode: single
Got it woking on Proxmox Alpine LXC.
Create the directory:
mkdir /opt/zigbee2mqtt
Create a script to check if the Zigbee2MQTT service crashed then execute a reboot.
cat <<EOF >/opt/zigbee2mqtt/check_zigbee_service.sh
if rc-service zigbee2mqtt status | grep -q "started"; then
echo "zigbee2mqtt is running."
else
echo "zigbee2mqtt is not running. Restarting..."
#rc-service zigbee2mqtt restart
reboot
fi
EOF
Give it execute permissions:
chmod +x /opt/zigbee2mqtt/check_zigbee_service.sh
Open your crontab file for editing:
nano /etc/crontabs/root
Add a line to schedule the script to run at your desired interval. For example, to run every 5 minutes:
*/5 * * * * /opt/zigbee2mqtt/check_zigbee_service.sh
reboot
Just to give an update here, I was running z2m on Proxmox Alpine LXC, I checked the build version of the Alpine package: https://pkgs.alpinelinux.org/package/edge/community/aarch64/zigbee2mqtt and found out the latest version in the repository is 1.34.0, while z2m is on 1.35.1 currently.
I prefer to have things updated, so I've migrated to LXC with z2m running under docker, and I just tested and when unplugging the IP coordinator and plugging back in, z2m successfully starts up on it's own, so this has solved it for me (and as an added bonus I get instant updates instead of waiting for the Alpine package to get updated 🙂 )
I know this does not solve the issue and brings a little overhead with running Docker instead of barebones, but I just wanted to add this here as another option that is proven to work.
I have the same issue. After unplugging my ethernet coordinator, the z2m addon crashes and does not come back up after reconnecting. This is a bug with z2m, because the watchdog in HA supervisor is not designed to handle such cases, the addon itself must handle it. See https://github.com/home-assistant/supervisor/pull/3779 when this behavior was updated. z2m should be updated to handle connection retrying (indefinitely) instead of crashing. If this behavior is problematic for another use case, then it can be a configurable setting.
The issue is still present with HA 2024.2.5, supervisor 2024.02.1, HAOS 12.0 and Zigbee2MQTT 1.36.0-1.
@3vilson @lhorak in my opinion what you wrote is not really relevant to this issue. The issue does not pertain to running on Proxmox and therefore this cannot be a solution. You have a different setup. It's like suggesting using ZHA instead of Z2M is a solution.
Sill an issue on HA 2024.4.3, supervisor 2024.04.0, HAOS 12.2 Zigbee2MQTT 1.36.1-1
How come it completely stops/crashes when it can't connect to the coordinator (which is very likely to occur specially when using network-based coordinators), instead of just checking every few minutes for coordinator availability?
I got exactly issue with the latest version z2m 1.38.0. When I unplug the ethernet coordinator, the z2m got crashed and unable to work again even the ethernet coordinator is plugged again with the same IP address. It seems to be hang forever until I restart the z2m manually again.
Here is the latest log from z2m:
[2024-06-02 17:13:21] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-02 17:13:21] error: zh:ember:uart:ash: Failed to init port with error Error: connect ECONNREFUSED 192.168.86.27:8888 [2024-06-02 17:13:21] error: zh:ember: Failed to reset and init NCP. Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. [2024-06-02 17:13:21] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-06-02 17:13:21] info: zh:ember:uart:ash: Total frames: RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: DATA frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: ACK frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: ACK timeouts : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-02 17:13:22] info: zh:ember:ezsp: ======== EZSP stopped ======== [2024-06-02 17:13:22] info: zh:ember: ======== Ember Adapter Stopped ======== [2024-06-02 17:13:22] error: z2m: Adapter disconnected, stopping [2024-06-02 17:13:22] info: z2m: Disconnecting from MQTT server [2024-06-02 17:13:22] info: z2m: Stopping zigbee-herdsman... [2024-06-02 17:46:53] info: z2m: Disconnecting from MQTT server [2024-06-02 17:46:53] info: z2m: Stopping zigbee-herdsman...
Happened to me too, when i unplugged the coordinator a few seconds its still working properly, but when its unplugged for a longer time, the Z2M just not automatically running, i had to start it manually