zigbee2mqtt icon indicating copy to clipboard operation
zigbee2mqtt copied to clipboard

After a restart in the network zigbee2mqtt does not reconnect to the adapter

Open kikejherrera opened this issue 9 months ago • 1 comments

What happened?

I have the SLZB-06M adapter connected via lan configured with ember. When there is a loss of connectivity because of a network reboot for example, even for a short time, the zigbee2mqtt addon of HomeAssistant disconnects, but the add-on does not stop. A manual restart of the add-on is necessary to retry the connection to the adapter.

What did you expect to happen?

Extend the time for zigbee2mqtt to disconnect if this is the normal operation, but at least periodically try the connection again or some automatic restart option.

How to reproduce it (minimal and precise)

  • The system is working
  • I restart the network
  • The addon is still running but the controller is disconnected and does not communicate.
  • I stop the addon
  • I start the addon

Zigbee2MQTT version

1.37.1

Adapter firmware version

7.4.3 [GA]

Adapter

SLZB-06M

Setup

Add-on on HomeAssistant in Intel NUC, HaOS system, connected via lan and remote adapter also connected via lan too.

Debug log

log.log

kikejherrera avatar May 11 '24 14:05 kikejherrera

I have the same issues on my slzb6m too.. If they lose Network connection for a few sec, Zigbee2mqtt(hass addon) just crashing and not retrying the connection to the adapter? Is there a way to tell zigbee2mqtt just keep looking for the tcp address of the Zigbee adaptor?

RuneNyhuus avatar May 17 '24 20:05 RuneNyhuus

I have the same issues on my slzb6m too.. If they lose Network connection for a few sec, Zigbee2mqtt(hass addon) just crashing and not retrying the connection to the adapter? Is there a way to tell zigbee2mqtt just keep looking for the tcp address of the Zigbee adaptor?

I got exactly same issue with the latest z2m 1.38.0 now. We hope it could be fixed soon. Thank you.

dinhchinh82 avatar Jun 02 '24 12:06 dinhchinh82

@Nerivec @Koenkk Could you take a look on this issue? This error is very annoyed when the Zigbee coordinator connects via TCP and encounters problems, the Zigbee2mqtt addon does not automatically reconnect the TCP coordinator. It requires a manual restart of Zigbee2mqtt to function normally again.

dinhchinh82 avatar Jun 04 '24 05:06 dinhchinh82

In the log I see that Z2M stops: Adapter disconnected, stopping, but you see that the addon itself doesn't stop?

Koenkk avatar Jun 04 '24 18:06 Koenkk

@kikejherrera The problem of the add-on not stopping when the adapter disconnects should be fixed in 1.38.0.

Nerivec avatar Jun 04 '24 19:06 Nerivec

In the log I see that Z2M stops: Adapter disconnected, stopping, but you see that the addon itself doesn't stop?

exactly

kikejherrera avatar Jun 04 '24 19:06 kikejherrera

@kikejherrera The problem of the add-on not stopping when the adapter disconnects should be fixed in 1.38.0.

Version 1.38.0-1 installed I disconnected, waited a bit and reconnected. This is the log. The plugin remains active but the frontend is not working (502 bad gateway).

log.log

kikejherrera avatar Jun 04 '24 19:06 kikejherrera

@kikejherrera The problem of the add-on not stopping when the adapter disconnects should be fixed in 1.38.0.

I manually restart the plugin and this is the log. But if I don't notice because some device doesn't work, it can be like this for hours. When restarting manually there are devices that don't connect again, but I understand that this is a different problem.

log.log

kikejherrera avatar Jun 04 '24 19:06 kikejherrera

And if you do a manual restart of the HA addon it starts working again? Can you try to disconnect another time to see if the issue always occurs?

Koenkk avatar Jun 04 '24 19:06 Koenkk

@Koenkk @Nerivec I just set up a fresh environment of z2m 1.38.0-1 and everything work normally. The addon is able to reconnect the TCP adapter again. I will keep testing this case more carefully.

Thank you very much for your quick response.

Info 2024-06-05 08:07:26======== ASH stopped ======== Info 2024-06-05 08:07:26======== EZSP stopped ======== Info 2024-06-05 08:07:26======== Ember Adapter Stopped ======== Error 2024-06-05 08:07:26Adapter disconnected, stopping Info 2024-06-05 08:09:58MQTT publish: topic 'homeassistant/binary_sensor/122105103981011015010911311611650_0x28dba7fffe501035/connection_state/config', payload '{"device":{"hw_version":"EmberZNet 7.4.3 [GA]","identifiers":["zigbee2mqtt_bridge_0x28dba7fffe501035"],"manufacturer":"Zigbee2MQTT","model":"Bridge","name":"Zigbee2MQTT Bridge","sw_version":"1.38.0"},"device_class":"connectivity","entity_category":"diagnostic","name":"Connection state","object_id":"zigbee2mqtt_bridge_connection_state","origin":{"name":"Zigbee2MQTT","sw":"1.38.0","url":"https://www.zigbee2mqtt.io"},"payload_off":"offline","payload_on":"online","state_topic":"zigbee2mqtt2/bridge/state","unique_id":"bridge_0x28dba7fffe501035_connection_state_zigbee2mqtt2","value_template":"{{ value }}"}'

This is the connection sensor: CleanShot 2024-06-05 at 08 13 59@2x

dinhchinh82 avatar Jun 05 '24 01:06 dinhchinh82

@Koenkk @Nerivec after about 4 hours running from last success, the z2m addon got freezed again and unable to reconnect up to now (10 hours later). The zigbee TCP coordinator still running well via LAN cable.

Please check the screenshot and the log below: CleanShot 2024-06-05 at 22 21 37@2x

CleanShot 2024-06-05 at 22 22 25@2x

[2024-06-05 21:18:54] info: zh:ember: [NCP COUNTERS] 0,542,70,0,0,0,0,0,0,0,0,0,61,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 [2024-06-05 21:18:54] info: zh:ember: [ASH COUNTERS] 451,98,48,49,0,0,0,0,462,50,49,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Port error: Error: read ETIMEDOUT [2024-06-05 21:22:30] error: zh:ember: !!! NCP FATAL ERROR reason=ERROR_SERIAL_INIT. ATTEMPTING RESET... !!! [2024-06-05 21:22:30] info: zh:ember:queue: Request dispatching stopped; queue=0 priorityQueue=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-06-05 21:22:30] info: zh:ember:uart:ash: Total frames: RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: DATA frames : RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: ACK frames : RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: ACK timeouts : RX=0 [2024-06-05 21:22:30] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-05 21:22:30] info: zh:ember:ezsp: ======== EZSP stopped ======== [2024-06-05 21:22:30] info: zh:ember: ======== Ember Adapter Stopped ======== [2024-06-05 21:22:30] info: zh:ember:uart:ash: Port closed. Error? true [2024-06-05 21:22:30] info: zh:ember: ======== Ember Adapter Starting ======== [2024-06-05 21:22:30] info: zh:ember:ezsp: ======== EZSP starting ======== [2024-06-05 21:22:30] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-06-05 21:22:33] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-06-05 21:22:33] info: zh:ember:uart:ash: Total frames: RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: DATA frames : RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: ACK frames : RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: ACK timeouts : RX=0 [2024-06-05 21:22:33] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-05 21:22:33] error: zh:ember:uart:ash: Failed to init port with error Error: connect ENETUNREACH 192.168.86.27:8888 [2024-06-05 21:22:33] error: zh:ember: Failed to reset and init NCP. Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR.

dinhchinh82 avatar Jun 05 '24 15:06 dinhchinh82

After restarting the z2m addon manually to make it work, the problem come again after several hours. log.log

dinhchinh82 avatar Jun 06 '24 01:06 dinhchinh82

I'm having a similar issue.

Start zigbee2mqtt inside a container with configuration connecting to a SLZB-06 and another MQTT container. Everything comes up fine, devices check in, Home Assistant can see and report on everything. Then after about a minute of working it fails with the following:

[2024-06-13 03:20:30] info:     zh:zstack:znp: Socket error
[2024-06-13 03:20:30] info:     zh:zstack:znp: Port closed
[2024-06-13 03:20:30] debug:    zh:controller: Adapter disconnected
[2024-06-13 03:20:30] info:     zh:zstack:znp: closing
[2024-06-13 03:20:30] error:    z2m: Adapter disconnected, stopping
[2024-06-13 03:20:30] debug:    z2m: Saving state to file /app/data/state.json
[2024-06-13 03:20:30] info:     z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
[2024-06-13 03:20:30] info:     z2m:mqtt: Disconnecting from MQTT server
[2024-06-13 03:20:30] info:     z2m: Stopping zigbee-herdsman...
[2024-06-13 03:20:30] debug:    zh:controller:database: Writing database to '/app/data/database.db'
[2024-06-13 03:20:30] info:     z2m: Stopped zigbee-herdsman
[2024-06-13 03:20:30] info:     z2m: Stopped Zigbee2MQTT

Checked firewall logs and no other network issues at present. The logs inside the SLZB-06 just indicate a disconnection:

[13:19:31] taskZB | New client: 10.0.0.10 id: 0
[13:20:30] taskZB | Client disconnected, id: 0

My configuration:

homeassistant: true
permit_join: false
frontend: true
availability: true
mqtt:
  base_topic: zigbee2mqtt
  server: mqtt://mosquitto:1883
  user: XXX
  password: XXX
  include_device_information: true
serial:
  port: tcp://10.0.0.185:6638
advanced:
  transmit_power: 20
  channel: 25
  legacy_availability_payload: false
  last_seen: ISO_8601_local
  legacy_api: false
  homeassistant_legacy_entity_attributes: false
  log_level: debug
device_options:
  legacy: false
devices:
  --cut off--

Running latest version on SLZB-06 at version v2.3.6.

passbe avatar Jun 13 '24 03:06 passbe

After a lot of debugging I believe this is not a zigbee2mqtt issue in my case. It seems with an upgrade of podman (5.1.1) the new rootless networking layer (pasta 2024_06_07.8a83b53) is causing the issue as reverting back to slirp4netns the issue resolves itself. I've been unable to determine why this is occurring.

passbe avatar Jun 20 '24 05:06 passbe

After a lot of debugging I believe this is not a zigbee2mqtt issue in my case. It seems with an upgrade of podman (5.1.1) the new rootless networking layer (pasta 2024_06_07.8a83b53) is causing the issue as reverting back to slirp4netns the issue resolves itself. I've been unable to determine why this is occurring.

could you share the command to revert?

dinhchinh82 avatar Jun 20 '24 05:06 dinhchinh82

I updated the line default_rootless_network_cmd = "slirp4netns" inside my /etc/containers/containers.conf file. Restart everything and the issue was resolved.

passbe avatar Jun 20 '24 06:06 passbe

I updated the line default_rootless_network_cmd = "slirp4netns" inside my /etc/containers/containers.conf file. Restart everything and the issue was resolved.

@passbe So I guess you are using docker version now? I just confused how to fix it using HA z2m addons.

dinhchinh82 avatar Jun 20 '24 06:06 dinhchinh82

is there a update for HA users yet in how to solve this issue? I run 2 06M sticks, one has the issue the other one not...

Kuechenfenster avatar Jul 17 '24 04:07 Kuechenfenster

HA addon: 1.39.0 Adapter: UZG-01 (XZG 20240707)

tonight the router automatically updated (thus disconnecting from the network) and thus the add-on did not reconnect.

in the morning it was necessary to press "Run" on the add-on and everything continues to work.

log.txt

leroyloren avatar Jul 26 '24 07:07 leroyloren

Yep, this happens to me as well, every time I reboot my router, Z2M crashes, and then I have to manually restart it.

kalpik avatar Jul 26 '24 07:07 kalpik

Same for me. After a power cut which caused the zigbee adaptor to restart, the Z2M addon crashes and does not restart and I have to press the start button in order to get it working again.

ezfrag2021 avatar Aug 07 '24 13:08 ezfrag2021

Same here, router restart causes z2m to stop and even after the adapter is available again it doesn't restart itself

crysxd avatar Aug 26 '24 19:08 crysxd

As a workaround until it's fixed I added an automation with the trigger as the state of connection of the zigbee adapter. When it changes from connected to disconnected the "start add-on" action fires to restart z2mqtt and repeats until the state changes back to connected.

ezfrag2021 avatar Aug 26 '24 21:08 ezfrag2021

@ezfrag2021 That's such an obvious solution. Quick manual for others:

  • Go to "Devices & services" and create a new "Ping (ICMP)" device for the ethernet Zigbee adapter
  • Create a new automation, when state of the ping device changes to "connected", start z2m
alias: WHEN slzb06 becomes available THEN start zigbee2mqtt
description: ""
trigger:
  - platform: state
    entity_id:
      - binary_sensor.slzb_06
    to: "on"
    for:
      hours: 0
      minutes: 0
      seconds: 15
condition: []
action:
  - action: hassio.addon_start
    metadata: {}
    data:
      addon: 45df7312_zigbee2mqtt
mode: single

crysxd avatar Aug 27 '24 05:08 crysxd

@crysxd That's a much more elegant solution than mine. And I never knew about how to create a Ping device before so you taught me something new. Thanks!

ezfrag2021 avatar Aug 27 '24 08:08 ezfrag2021

Restart everything and the issue was resolved

@passbe what all do you recommend restarting? I can see it's working if I set it to a host network, so it's definitely this podman issue for me. Just looking to see what to do to get it to work without the host network

asagent7 avatar Sep 03 '24 06:09 asagent7