core icon indicating copy to clipboard operation
core copied to clipboard

ZHA is failing constantly, requires reloading/reinitialization

Open mmccool opened this issue 1 year ago • 52 comments

The problem

The Zigbee (ZHA) integration on my HA instance is failing, falling back into a "Failure to setup" state. Reloading it fails again, but then retries, entering an "Initialization" stage, after which it works... for a while, from a few minutes to an hour, before failing again. Occasionally the above does not work and a power cycle is needed to get it to reload.

I am running on Home Assistant Yellow, and in particular am running "Yellow Multi-PAN".

What version of Home Assistant Core has the issue?

core-2023.12.1

What was the last working version of Home Assistant Core?

core-2023.10

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

config_entry-zha-d179b6f823a03243c5e4b54db431cebc.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

[0xDFC0:1:0x0001]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), ControllerError('ApplicationController is not running'), ControllerError('ApplicationController is not running'), ControllerError('ApplicationController is not running')]
1:32:45 PM – (WARNING) Zigbee Home Automation - message first occurred at December 9, 2023 at 3:35:49 PM and shows up 126 times
[0x0898:1:0xff01] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 710, in send_packet raise ControllerError("ApplicationController is not running") bellows.exception.ControllerError: ApplicationController is not running The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 518, in write_attributes_safe res = await self.write_attributes(attributes, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running
1:29:04 PM – (ERROR) runner.py - message first occurred at 1:27:34 PM and shows up 31 times
Discarding _save_attribute event
1:29:01 PM – (WARNING) components/zha/binary_sensor.py - message first occurred at 1:29:01 PM and shows up 19 times
[0x4C2A:1:0x0702]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast'
1:28:47 PM – (WARNING) Zigbee Home Automation - message first occurred at 1:28:47 PM and shows up 6 times
Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f62493690>>
1:28:46 PM – (WARNING) helpers/dispatcher.py - message first occurred at December 9, 2023 at 4:16:54 PM and shows up 12 times
Error setting up entry Yellow Multi-PAN for zha
December 9, 2023 at 4:36:40 PM – (ERROR) config_entries.py

Details from [0xDFC0:1:0x0001]:
Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:537
Integration: Zigbee Home Automation (documentation, issues)
First occurred: December 9, 2023 at 3:35:49 PM (126 occurrences)
Last logged: 1:32:45 PM

[0xDFC0:1:0x0001]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), ControllerError('ApplicationController is not running'), ControllerError('ApplicationController is not running'), ControllerError('ApplicationController is not running')]
[0xDFC0:1:0x0500]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), ControllerError('ApplicationController is not running'), ControllerError('ApplicationController is not running'), ControllerError('ApplicationController is not running')]
[0x9278:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x9278:1:0x0702]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x9278:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]

Details from [0x0898:1:0xff01]:
Logger: zigpy.zcl
Source: runner.py:188
First occurred: 1:27:34 PM (31 occurrences)
Last logged: 1:29:04 PM

[0x0898:1:0xff01] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 710, in send_packet raise ControllerError("ApplicationController is not running") bellows.exception.ControllerError: ApplicationController is not running The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 518, in write_attributes_safe res = await self.write_attributes(attributes, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running
[0xB790:1:0xff01] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 710, in send_packet raise ControllerError("ApplicationController is not running") bellows.exception.ControllerError: ApplicationController is not running The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 518, in write_attributes_safe res = await self.write_attributes(attributes, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running
[0x7426:1:0xff01] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 710, in send_packet raise ControllerError("ApplicationController is not running") bellows.exception.ControllerError: ApplicationController is not running The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 518, in write_attributes_safe res = await self.write_attributes(attributes, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running
[0xC67C:1:0xff01] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 710, in send_packet raise ControllerError("ApplicationController is not running") bellows.exception.ControllerError: ApplicationController is not running The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 518, in write_attributes_safe res = await self.write_attributes(attributes, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper with wrap_zigpy_exceptions(): File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions raise HomeAssistantError(message) from exc homeassistant.exceptions.HomeAssistantError: Failed to send request: ApplicationController is not running
[0xBC11:1:0x000a] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 411, in reply return await self._endpoint.reply( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 278, in reply return await self.device.reply( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 474, in reply return await self.request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 767, in send_packet status, _ = await self._ezsp.sendUnicast( ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'sendUnicast'

Details from 
Logger: zigpy.appdb
Source: components/zha/binary_sensor.py:227
First occurred: 1:29:01 PM (19 occurrences)
Last logged: 1:29:01 PM

Discarding _save_attribute event

Details from 
Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:389
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 1:28:47 PM (6 occurrences)
Last logged: 1:28:47 PM

[0x4C2A:1:0x0702]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast'
[0x4C2A:1:0x0b04]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast'
[0xE85E:1:0x0006]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast'
[0xE85E:1:0x0b04]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast'
[0xE85E:1:0x0702]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast'
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 338, in wrapper
    return await func(cluster_handler, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 389, in async_initialize
    await self._get_attributes(
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 491, in _get_attributes
    read, _ = await self.cluster.read_attributes(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 524, in read_attributes
    result = await self.read_attributes_raw(to_read, manufacturer=manufacturer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request
    await self._application.request(
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 767, in send_packet
    status, _ = await self._ezsp.sendUnicast(
                      ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'sendUnicast'

Detail from 
Logger: homeassistant.helpers.dispatcher
Source: helpers/dispatcher.py:59
First occurred: December 9, 2023 at 4:16:54 PM (12 occurrences)
Last logged: 1:28:46 PM

Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f62493690>>

Details from [0x4C2A:1:0x0702]
Logger: homeassistant.config_entries
Source: config_entries.py:402
First occurred: December 9, 2023 at 4:36:40 PM (1 occurrences)
Last logged: December 9, 2023 at 4:36:40 PM

Error setting up entry Yellow Multi-PAN for zha
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 402, in async_setup
    result = await component.async_setup_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 167, in async_setup_entry
    zha_gateway = await ZHAGateway.async_from_config(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 192, in async_from_config
    await instance.async_initialize()
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 210, in async_initialize
    await app.startup(auto_form=True)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 223, in startup
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 144, in connect
    await ezsp.startup_reset()
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 111, in startup_reset
    await self._gw.wait_for_startup_reset()
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 170, in wait_for_startup_reset
    await self._startup_reset_future
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 1003, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 76, in data_received
    self.frame_received(frame)
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 103, in frame_received
    self.data_frame_received(data)
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 124, in data_frame_received
    self._application.frame_received(self._randomize(data[1:-3]))
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 305, in frame_received
    self._protocol(data)
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 83, in __call__
    sequence, frame_id, data = self._ezsp_frame_rx(data)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/v4/__init__.py", line 33, in _ezsp_frame_rx
    return data[0], data[2], data[3:]
           ~~~~^^^
IndexError: index out of range

Additional information

I think the problems started with 2023.12.0 but I am not completely sure - it did have a tendency to crash before, but not nearly as often. Before 2023.12 it maybe happened once a week, which was annoying but semi-tolerable. Now it is basically unusable, and essentially breaks any automation that depends on a zigbee device.

I should also mention I have set up and used Matter devices but don't have any active right now. The symptoms do seem similar to #105344 and #105345 - but I have no Xbee devices.

mmccool avatar Dec 11 '23 17:12 mmccool

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

home-assistant[bot] avatar Dec 11 '23 17:12 home-assistant[bot]

Would you be able to enable debug logging for the ZHA integration and upload a complete log of it crashing and failing to reload?

puddly avatar Dec 11 '23 17:12 puddly

I did enable debug logging, and just reloaded it. Let me check if it's crashed yet... nope, still working. Will do so after it crashes. I had an older log file but I let it run too long and it became too big to upload, so just restarted debug logging again.

mmccool avatar Dec 11 '23 17:12 mmccool

So as an additional observation I have seen errors about the NCP failing to respond, watchdog failure, etc. and then I see it enter an "Initialization" state on its own, but succeeds. I have seen this twice now. I will wait for an unrecoverable error before uploading the logs but the recoverable ones seem to be happening every 5-10m or so (I assuming this is probably the Watchdog restarting it).

mmccool avatar Dec 11 '23 18:12 mmccool

Having the exact same problem! This was changed in 12.1 version: "Fix ZHA quirk ID custom entities matching all devices", could this be the problem?

tangodelta73 avatar Dec 11 '23 18:12 tangodelta73

Ugh, it finally crashed but the debug log was still too big to upload (over 400MB!). It seems it appended to my old log rather than start a new one, also. I am going to manually edit it just to show the last bit. I will keep the full log around, let me know if you want me to extract any other sections.

mmccool avatar Dec 11 '23 18:12 mmccool

@mmccool In your log, did you manually reload the ZHA integration about five times? If so, can you describe why? Up until the point you reload it, the log looks normal.

puddly avatar Dec 11 '23 20:12 puddly

Screenshot_20231211-220948 error_log-2.txt Encounters this from time to time. Especially after a reboot. Needs to disable and activate it . Then it comes to a "normal" state. But accepting ligthcontroll. Gives failed to send signal or something like that. Will gather logs aswell

Letalis avatar Dec 11 '23 21:12 Letalis

I've also witnessed issues with ZHA integration after upgrading to 2023.12. I will upload logs next time when it happens. Usually it breaks after restarting HASS.

pimeys avatar Dec 11 '23 21:12 pimeys

@Letalis I think you may have submitted your comment before your screenshot and log files actually finished uploading! Edit them into your original comment, the Uploading ... text will be replaced once they finish uploading.

puddly avatar Dec 11 '23 21:12 puddly

For the log, I did NOT manually reload it. For each of those five instances it entered the "Initialization" state on its own and restarted on its own. It succeeded restarting/reloading several times on its own until finally this self-initiated loop failed. I'm guessing these restarts were initiated by a watchdog timer.

Also, I truncated the log. It did this a lot more than five times.

mmccool avatar Dec 11 '23 23:12 mmccool

Here the same problem: home-assistant_zha_2023-12-12T09-17-49.702Z.log

Logger: homeassistant.components.zha.core.cluster_handlers Source: components/zha/core/cluster_handlers/init.py:537 Integration: Zigbee Home Automation (documentation, issues) First occurred: 10:13:23 (2 occurrences) Last logged: 10:13:23

[0x4D68:1:0x0500]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')] [0x4D68:1:0x0406]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]

marcroca81 avatar Dec 12 '23 09:12 marcroca81

Is this the same problem as reported in issue #105445?

I also have the problem that all Zigbee and Matter devices are unavailable after upgrading HA Core to >= 2023.12.0. Zigbee, Matter and Thread integrations are permanently in a failure state.

ckadluba avatar Dec 12 '23 12:12 ckadluba

Same here. 66 devices / 443 entities via RaspBee II on HAOS 2023.12.1. After a few hours ZHA fails and does not recover by itself. Reload of ZHA helps instantly, but just for another few hours. Disabling ZHA's entity update polling doesn't help. CPU and memory is at bay. No WiFi on Zigbee channel. It was running flawlessly for months.

 Logger: zigpy.application
Source: components/zha/core/gateway.py:210
First occurred: 10. Dezember 2023 um 19:05:15 (10 occurrences)
Last logged: 03:46:53

    Zigbee channel 15 utilization is 81.18%!
    If you are having problems joining new devices, are missing sensor updates, or have issues keeping devices joined, ensure your coordinator is away from interference sources such as USB 3.0 devices, SSDs, WiFi routers, etc.
    Zigbee channel 15 utilization is 82.35%!
    Zigbee channel 15 utilization is 76.47%!
    Zigbee channel 15 utilization is 77.65%!
Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:537
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 10. Dezember 2023 um 19:05:27 (528 occurrences)
Last logged: 13:13:59

[0xC543:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
[0xB202:3:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
[0x1C75:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.MAC_NO_ACK: 233>'), TimeoutError(), TimeoutError(), TimeoutError()]
[0x5E7F:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), TimeoutError(), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
[0x3E5F:1:0x0008]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), TimeoutError()]

In the desperate I've enabled Source Routing, which takes off some load from the mesh/coordinator.

zha:
  zigpy_config:
    source_routing: true

(configuration.yaml)

For now it seems to work.

I'll report, if that changes.

What about the 38400 baud limit (which is just 4,8 kbytes/s), we have on the serial side (RaspBee II)? This sounds like a bottleneck to me, if the whole Zigbee protocol is processed by ZHA and might became more complex now. By using ZHA the bandwidth demand on the serial interface is completely different, in comparison of using the whole Zigbee stack on the RaspBee II chip itself (Deconz approach).

I've ordered a SMLIGHT SLZB-06M coordinator, maybe things are different with that type of hardware...

Citizen-2CB8A24A avatar Dec 12 '23 15:12 Citizen-2CB8A24A

+1 keeps getting offline and is not selfhealing, always worked until last update. Same logs as above. Now trying the option of [Citizen-2CB8A24A] Update : suggestion option did not work, only 24 devices 169 entities.

psuurbach avatar Dec 12 '23 17:12 psuurbach

I should also say I have a fair number of Zigbee devices: 49 devices, 376 entities. However, not so sure it's a bottleneck issue as it worked ok until recently - unless the last update made it significantly less efficient. I'm also running on a "large" CM4 (8GB) so I don't think memory is a problem. Unlike several commentors on https://github.com/home-assistant/core/issues/105445, who seem to have a very similar problem, I'm using "mainstream/supported" HW: the HA Yellow platform. Also, this issue it does try to restart (so does not stay in "Initializing" state) but tends to fail and get stuck in a "Failed to set up" state (typically after passing through "Failed to set up, retrying"...). Reloading tends to fail once, then succeeds after one retry - it does not succeed immediately.

mmccool avatar Dec 12 '23 17:12 mmccool

Just wanted to chime in... I was having problems with my SONOFF Zigbee 3.0 USB Dongle Plus on my X86 install. ZHA was failing every few hours after installing 2023.12.0. Restored my 2023.11.3 backup. Installed 2023.12.0 again... ZHA did the same thing again... Failing every few hours. Went to my 2023.11.3 backup again and ZHA has been solid on my system for days. I didn't download any logs because I thought it was something that I had done. Going to stay on 2023.11.3 for the time being.

derekgermano avatar Dec 12 '23 19:12 derekgermano

Just wanted to chime in... I was having problems with my SONOFF Zigbee 3.0 USB Dongle Plus on my X86 install. ZHA was failing every few hours after installing 2023.12.0. Restored my 2023.11.3 backup. Installed 2023.12.0 again... ZHA did the same thing again... Failing every few hours. Went to my 2023.11.3 backup again and ZHA has been solid on my system for days. I didn't download any logs because I thought it was something that I had done. Going to stay on 2023.11.3 for the time being.

If you can flip back for a short while and capture logs it would be greatly appreciated

dmulcahey avatar Dec 12 '23 20:12 dmulcahey

For now it seems to work.

I'll report, if that changes.

Well, ZHA failed again this morning. Source Routing doesn't help. After ZHA integration reload things went back to normal again instantly.

Logger: zigpy.application
Source: components/zha/core/gateway.py:210
First occurred: 07:05:51 (2 occurrences)
Last logged: 07:05:51

Zigbee channel 15 utilization is 77.65%!
If you are having problems joining new devices, are missing sensor updates, or have issues keeping devices joined, ensure your coordinator is away from interference sources such as USB 3.0 devices, SSDs, WiFi routers, etc.
Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:537
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 12. Dezember 2023 um 23:31:06 (51 occurrences)
Last logged: 07:06:13

[0xA305:1:0x0702]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
[0x67CE:2:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
[0xA305:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>')]
[0xC543:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), TimeoutError()]
[0x2E0A:3:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), DeliveryError('Failed to deliver packet: <TXStatus.NWK_ROUTE_DISCOVERY_FAILED: 208>'), TimeoutError()]

Citizen-2CB8A24A avatar Dec 13 '23 07:12 Citizen-2CB8A24A

A preventive reload would be an option (as a workaround). Is there a way to automate a reload of the integration every hour?

Citizen-2CB8A24A avatar Dec 13 '23 07:12 Citizen-2CB8A24A

Same thing has happened a few times for me, and just now. It failed and reinitialized a few times in quick succession before I manually intervened. I manually reloaded the integration and it coincidentally stopped failing.

markaggar avatar Dec 13 '23 08:12 markaggar

A preventive reload would be an option (as a workaround). Is there a way to automate a reload of the integration every hour?

If I do a preventive reload or reboot of the HA Blue I have a living hell for a couple of minutes to get it stable again.

Letalis avatar Dec 13 '23 08:12 Letalis

A preventive reload would be an option (as a workaround). Is there a way to automate a reload of the integration every hour?

If I do a preventive reload or reboot of the HA Blue I have a living hell for a couple of minutes to get it stable again.

It's not about reloading the box ("reboot"), it's just about reloading the integration. That happens very fast and might work for another hour without failure.

If the integration fails, things are broken anyways. It cannot get worse. Recurring manual intervention cannot be the solution for the meantime.

Citizen-2CB8A24A avatar Dec 13 '23 14:12 Citizen-2CB8A24A

Please include debug logs in your comments that show ZHA running from startup, crashing, and not reloading. It'll really speed up figuring out the root cause of this problem.

puddly avatar Dec 13 '23 14:12 puddly

@puddly Thank you for the workaround! I'm enabling debug log now...

Citizen-2CB8A24A avatar Dec 13 '23 15:12 Citizen-2CB8A24A

Here's the log failure from last night if it's helpful. Note I'm running the Nortek zigbee/zwave stick via a VM on a Synology NAS. Never had these type of issues until 2023.12.1 (didn't install 2023.12.0)..

Logger: homeassistant.components.zha.core.cluster_handlers Source: components/zha/core/cluster_handlers/init.py:380 Integration: Zigbee Home Automation (documentation, issues) First occurred: 12:04:14 AM (8 occurrences) Last logged: 12:07:11 AM

[0x0377:1:0x0b04]: 'async_initialize' stage failed: 'NoneType' object has no attribute 'sendUnicast' Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 338, in wrapper return await func(cluster_handler, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/init.py", line 380, in async_initialize await self._get_attributes( File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/init.py", line 491, in _get_attributes read, _ = await self.cluster.read_attributes( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/init.py", line 524, in read_attributes result = await self.read_attributes_raw(to_read, manufacturer=manufacturer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/init.py", line 377, in request return await self._endpoint.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request return await self.device.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 299, in request await self._application.request( File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 824, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 767, in send_packet status, _ = await self._ezsp.sendUnicast( ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'sendUnicast'

markaggar avatar Dec 13 '23 16:12 markaggar

My setup is experiencing the same behaviour for the last couple of days ( I suspect since the last update, to 2023.12.1 ).

My network settings: Screenshot 2023-12-13 at 18 46 07

Below are my ZHA debug related logs. Crash happened at what is visible in logs at ~19:46 - 19:49

has-zha.log.zip

@puddly - logs updated here. Thanks!

xyboox avatar Dec 13 '23 16:12 xyboox

@markaggar @xyboox When I refer to debug logs, I specifically mean the debug logging you either enable through YAML (https://www.home-assistant.io/integrations/zha/#debug-logging) or with the "enable debug logging" button in ZHA's config. Warning logs or errors aren't of much use here because I need to know everything that happened before the warnings are logged.

If you can, reproduce the issue again and edit the debug logs into your comments.

puddly avatar Dec 13 '23 17:12 puddly

@puddly Here comes a debug log from my system. What I did was enabling the debug. Then did a reload which caused somewhat of disturbance. Then I did HA reboot and then it took a while for it to be 90% stable, home-assistant_zha_2023-12-13T17-08-38.741Z.zip

Letalis avatar Dec 13 '23 17:12 Letalis