core icon indicating copy to clipboard operation
core copied to clipboard

[ZHA] Integration randomly stops working, sits in 'initialising' state. (still)

Open thefunkygibbon opened this issue 6 months ago • 28 comments

The problem

As per previous issue (https://github.com/home-assistant/core/issues/105445) I am experiencing my ZHA randomly becoming completely unresponsive and seeing that the integration is sitting "initialising"

What version of Home Assistant Core has the issue?

core-2024.1.2

What was the last working version of Home Assistant Core?

core-2024.1.1

What type of installation are you running?

Home Assistant Container

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

config_entry-zha-5fb366dc2478313fb3cb2b29c52254af.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

[home-assistant_zha_2024-01-07T19-41-14.250Z.log.zip](https://github.com/home-assistant/core/files/13854715/home-assistant_zha_2024-01-07T19-41-14.250Z.log.zip)

Additional information

No response

thefunkygibbon avatar Jan 07 '24 19:01 thefunkygibbon

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

home-assistant[bot] avatar Jan 07 '24 20:01 home-assistant[bot]

Same problem here

image

kaciker avatar Jan 08 '24 07:01 kaciker

Encountered same issue on 2023.12.4 error_log-3.txt

Tried to reconfigure network, reboot, now integration is not showing "initializing" but no zigbee device is working. Had to re-pair every zigbee device to work again. (possibly due to network reconfig I did when stuck initializing?)

Diagnostics: zha-b1738f4b724427bec34bcc396a7b0ff4-Zigbee Coordinator-e8e28ba646e99c7f83d95fb82f306659.json.txt config_entry-zha-b1738f4b724427bec34bcc396a7b0ff4.json.txt

cdalexndr avatar Jan 08 '24 12:01 cdalexndr

@cdalexndr you can do a full system reboot Settings > System > Hardware > Advanced options> Reboot system. Or less recommended: unplug it and replug it back in. You do not need to repair (I have not needed to). After rebooting, ZHA should work again. For me, it works anywhere between 3 hours and 24 hours before it needs another reboot.


Here are my logs, I am experiencing the same issue. home-assistant_2024-01-08T14-00-17.880Z.log

I am running Core 2023.1.2, Supervisor 2023.12.0, OS 11.3, on a Raspberry Pi 4. (HUSBZB-1) HubZ Smart Home Controller - Standard Com Port, s/n: 1160046D - Silicon Labs


This issue is probably a duplicate of this: https://github.com/home-assistant/core/issues/105506

brylee123 avatar Jan 08 '24 14:01 brylee123

I'm experiencing this issue as well (after seeing the same issue start in the 2023.12.x releases as documented in #105445 and related tickets). ZHA was stable for me through 2023.11.3, and has not worked well since. I avoided the 2023.12.x releases all together due to these bugs, but updated to 2024.1.1 earlier this week. ZHA worked for a few days, but as of this afternoon, has started falling into the "Initializing..." As of now, I can't get it to recover, even with a full system reboot and power off. I'm using an HA Yellow with the built-in Zigbee radio.

Here are the logs with debugging enabled since the last reboot. ZHA never comes online and stays stuck in initializing:

home-assistant_2024-01-09T06-09-23.858Z.log

I'll likely need to revert to 2023.11.3 again, but I'm not sure how long I can stay on that old of a version. I don't suppose there's been any discussion of reverting ZHA to the 2023.11.3 code base until these issues can be resolved?

asayler avatar Jan 09 '24 06:01 asayler

I can confirm the issues of ZHA instability are still present even on the latest. I had a stable 2024.1.2 for a few days but since yesterday ZHA just randomly restarted twice, 6 hours apart, :( the 2nd time the system was unstable for an hour before recovering.

Will try to see if I can manage to capture logs

harvindhillon avatar Jan 09 '24 09:01 harvindhillon

Same issue here.

tjerkw avatar Jan 09 '24 22:01 tjerkw

Same issue here.

Same thing here:

I run HAOS on an NUC, have the SkyConnect connected via USB extension cable (like you're supposed to), got the 2.4 update to the Silicon Labs Multiprotocol to 2.4.0, things started breaking... hours later I updated to 2.4.1... still broken... another few hours 2.4.2 was pushed and I upgraded.

Since 2.4.2 it's been up a day, then randomly the ZHA integration goes back to "Failed Setup Will Retry"

What's worse, I'm running both Zigbee AND Thread on the SkyConnect... so BOTH type of devices (85 of them) are broken... including lights.

Wife Acceptance Factor is dropping rapidly.

How do I downgrade back to 2.3.2??

joelevi avatar Jan 10 '24 16:01 joelevi

Had the same issue, Although my error is specifically:

async_initialize: all attempts have failed: [TimeoutError(), TimeoutError(), TimeoutError(), TimeoutError()]

Downgrading to 2023.12.4 seems to have caused less problems after I restarted the zigbee router based devices.

itsSaad avatar Jan 10 '24 18:01 itsSaad

Wife Acceptance Factor is dropping rapidly.

Know what you mean. Home automations (including things we have come to rely on) being broken for months is not winning me any points. I had to revert a bunch of things to failsafe mode and find workarounds for a bunch of other things. Overall this is causing me a significant amount of work and effort.

mmccool avatar Jan 11 '24 01:01 mmccool

@cdalexndr: upgrade to 2024.1.2. There were may bugs fixed between 2023.12.4 and then.

Multi-PAN has issues independent of ZHA, some of which will be addressed in an update scheduled for release very soon. If you're having reloads and using multi-PAN, this isn't a ZHA issue, nor something ZHA can fix. Be aware that multi-PAN is still in the experimental phase (though improving) so if you need stability, I strongly suggest using separate sticks for Zigbee and Thread (or using an external Thread border router).

puddly avatar Jan 11 '24 15:01 puddly

so was there anything obvious in my logs? do you want/need me to do anything to help get to the bottom of this?

were the libraries which were changed (reverted) in 1.1 changed back again in 1.2 or something?

thefunkygibbon avatar Jan 11 '24 15:01 thefunkygibbon

2024.1.1 to 2024.1.2 was a very tiny change and there would be no difference between how the two behave network-wise.

What exact coordinator are you using?

puddly avatar Jan 11 '24 16:01 puddly

For me the issues started in the 2023.12.X releases. I am with Sonoff Dongle P

Even now I see random reboots, example: image

Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1098, in request_callback_rsp await self.request(request, timeout=timeout, **response_params) File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1052, in request self._uart.send(frame) ^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'send' This is the indicator that re-init has been done

Just a minute before, a lot of timeouts

`Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1098, in request_callback_rsp
    await self.request(request, timeout=timeout, **response_params)
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1059, in request
    response = await response_future
               ^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy_znp/api.py", line 1097, in request_callback_rsp
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.11/site-packages/async_timeout/__init__.py", line 141, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.11/site-packages/async_timeout/__init__.py", line 228, in _do_exit
    raise asyncio.TimeoutError
TimeoutError`

harvindhillon avatar Jan 11 '24 18:01 harvindhillon

Can confirm issues since 2023.12.x aswell. The Skyconnect seems to be crashing or loosing the connection, my logs from the multiprotocoll integration are full of some messages like trying to conncet with baudrate X while its trying different baudrates before it stops overall.

Physically reconnecting the Skyconnect and starting the integration again fixes the issue temporarly. It mostly crashes at 2-3AM in the night. With 2024.0 it was stable for a couple of days, now we are back to 24h.

Trabbi1999 avatar Jan 11 '24 19:01 Trabbi1999

my coordinator as it is currently is a sonoff brigge flashed with tasmota. unlike others here.

thefunkygibbon avatar Jan 11 '24 21:01 thefunkygibbon

Let me set one up to test. I've been running my home network on a Silvercrest gateway without issues for the past day so perhaps it's something specific to the Sonoff.

puddly avatar Jan 11 '24 22:01 puddly

I'm also seeing these issues since the 2023.12.x update using an HA Yellow, so it's not just Sonoff. The coordinator built into the Yellow hardware seems to trigger the issue as well. I uploaded my debug logs previously in this thread before migrating back to 2023.11.3 since the network was largely unusable, but let me know if you need more logs and I can try to upgrade again.

On Thu, Jan 11, 2024, 15:09 puddly @.***> wrote:

Let me set one up to test. I've been running my home network on a Silvercrest gateway without issues for the past day so perhaps it's something specific to the Sonoff.

— Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/107490#issuecomment-1888047267, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPI4U766AKRVB3IQUHTYOBPILAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA2DOMRWG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

asayler avatar Jan 11 '24 22:01 asayler

There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload.

Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past.

puddly avatar Jan 11 '24 22:01 puddly

I tested 2024.1.2 and that's where I had the most recent issues on my HA Yellow. The logs above are from that version. Just noting that the issues started in 2023.12.x. Prior to that, ZHA was rock solid. Ever since, it's been very flaky.

On Thu, Jan 11, 2024 at 3:20 PM puddly @.***> wrote:

There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload.

Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past.

— Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/107490#issuecomment-1888060224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

asayler avatar Jan 12 '24 00:01 asayler

I tested 2024.1.2 and that's where I had the most recent issues on my HA Yellow. The logs above are from that version. Just noting that the issues started in 2023.12.x. Prior to that, ZHA was rock solid. Ever since, it's been very flaky. On Thu, Jan 11, 2024 at 3:20 PM puddly @.> wrote: There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload. Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past. — Reply to this email directly, view it on GitHub <#107490 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

If this is the case are you willing to try something drastic to help identify this? Would you be willing to try running the most recent version with all other integrations disabled? Just for a bit to see if the stability issue goes away?

dmulcahey avatar Jan 12 '24 00:01 dmulcahey

The challenge is that it often takes a day or more for th issue to crop up (but then it tends to stay -- even full system reboots wouldn't bring it back last time -- I had to downgrade to get it working again). Would it work to wait to disable the other integrations until the issue crops up, and then turn the other integrations off? If so, I may be able to do that, but this is also my house, and not a test site, so my ability to have extended downtime is a bit limited (hence why I had to revert to 2023.11.3 where things are at least stable).

On Thu, Jan 11, 2024 at 5:06 PM David F. Mulcahey @.***> wrote:

I tested 2024.1.2 and that's where I had the most recent issues on my HA Yellow. The logs above are from that version. Just noting that the issues started in 2023.12.x. Prior to that, ZHA was rock solid. Ever since, it's been very flaky. … <#m_3367827849832089470_> On Thu, Jan 11, 2024 at 3:20 PM puddly @.> wrote: There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload. Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past. — Reply to this email directly, view it on GitHub <#107490 (comment) https://github.com/home-assistant/core/issues/107490#issuecomment-1888060224>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

If this is the case are you willing to try something drastic to help identify this? Would you be willing to try running the most recent version with all other integrations disabled? Just for a bit to see if the stability issue goes away?

— Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/107490#issuecomment-1888164831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACLHPLMZP5C4WDPMHMSYBTYOB5BZAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGE3DIOBTGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

asayler avatar Jan 12 '24 00:01 asayler

It’s worth a shot and I completely understand the impact this would have. No worries either way.

dmulcahey avatar Jan 12 '24 00:01 dmulcahey

I have the same problem, I have to reboot 2-3 times for ZHA to startup correctly. I am using a Sonoff Zigbee 3.0 dongle. I had this occur before updating to 2024.1, but since then the Home Assistant frontend crashes completely after 1-2 minutes whenever ZHA does not start. I am not 100% sure that these two issues are related though.

These are the related log messages:

2024-01-13 20:07:45.709 DEBUG (MainThread) [homeassistant.components.zha] ZHA storage file does not exist or was already removed
2024-01-13 20:07:50.431 ERROR (bellows.thread_0) [bellows.uart] CRC error in frame b'9c2791907e' (b'9190' != b'140c')
2024-01-13 20:07:56.889 DEBUG (MainThread) [homeassistant.components.zha] Failed to set up ZHA
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 74, in command
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 163, in async_setup_entry
    zha_gateway = await ZHAGateway.async_from_config(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 193, in async_from_config
    await instance.async_initialize()
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 211, in async_initialize
    await app.startup(auto_form=True)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 226, in startup
    await self.initialize(auto_form=auto_form)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 142, in initialize
    await self.load_network_info(load_devices=False)
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 251, in load_network_info
    (nwk,) = await ezsp.getNodeId()
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 215, in _command
    return await self._protocol.command(name, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 73, in command
    async with asyncio_timeout(EZSP_CMD_TIMEOUT):
  File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

EuleMitKeule avatar Jan 13 '24 18:01 EuleMitKeule

Same problem

2024-01-14 21:28:18.852 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running') 2024-01-14 21:28:18.853 WARNING (MainThread) [zigpy.application] Watchdog failure Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 665, in _watchdog_loop await self.watchdog_feed() File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 647, in watchdog_feed await self._watchdog_feed() File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 999, in _watchdog_feed (res,) = await self._ezsp.readCounters() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 212, in _command raise EzspError("EZSP is not running") bellows.exception.EzspError: EZSP is not running 2024-01-14 21:30:38.990 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1047, in websocket_get_configuration zha_gateway = get_zha_gateway(hass) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists 2024-01-14 21:30:39.125 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1140, in websocket_get_network_settings backup = async_get_active_network_settings(hass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/api.py", line 43, in async_get_active_network_settings app = get_zha_gateway(hass).application_controller ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists 2024-01-14 21:30:45.635 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1047, in websocket_get_configuration zha_gateway = get_zha_gateway(hass) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists 2024-01-14 21:30:45.641 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140501092979264] Error handling message: Unknown error (unknown_error) M from 192.168.x.x (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 26, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/zha/websocket_api.py", line 1140, in websocket_get_network_settings backup = async_get_active_network_settings(hass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/api.py", line 43, in async_get_active_network_settings app = get_zha_gateway(hass).application_controller ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/zha/core/helpers.py", line 459, in get_zha_gateway raise ValueError("No gateway object exists") ValueError: No gateway object exists

mortezaadi avatar Jan 14 '24 20:01 mortezaadi

whatever was changed in 2024.1.3 has made it even worse. what was once a week has happened about 4 times in 2 days

thefunkygibbon avatar Jan 15 '24 08:01 thefunkygibbon

for what its worth, here is what it is saying mostly in the debug logs (other than the other ZHA log entries saying that DELIVERY_FAILED errors (understandably) (it's just occured again!)

2024-01-14 23:20:46.009 ERROR (MainThread) [zigpy.zcl] [0xA01A:1:0x0b04] Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/init.py", line 411, in reply return await self._endpoint.reply( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 278, in reply return await self.device.reply( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 483, in reply return await self.request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 317, in request await send_request File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 833, in request await self.send_packet( File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 868, in send_packet status, _ = await self._ezsp.sendUnicast( ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'sendUnicast'

thefunkygibbon avatar Jan 15 '24 13:01 thefunkygibbon

My network went on a restarting spree today, a lot of errors and timeouts. Devices dropped off but started rejoining when I lauched add devices. Apologies I cant pin point when things started to hit the fan but trying to fix it I had turn on debug a few times. Hopefully something jumps out home-assistant.log home-assistant_zha_2024-01-15T22-01-51.716Z.log home-assistant_zha_2024-01-15T21-27-11.054Z.log

harvindhillon avatar Jan 15 '24 22:01 harvindhillon

There were no ZHA or library changes between 2024.1.2 and 2024.1.3 so I think the problem you're having is just randomly manifesting. The repeated restart issue will be fixed by https://github.com/home-assistant/core/pull/107963.

  • If you are using multi-PAN, this isn't a ZHA issue: please head to https://github.com/home-assistant/addons/issues/3408.
  • If you arent using multi-PAN and are using a Silicon Labs radio (SkyConnect, Sonoff v2, etc.), can you describe your host hardware (e.g. Pi 4, old server, Green)? I believe this issue is affecting predominantly slower hardware, as the radio resets only when the host stops communicating with it for 10+ seconds (ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT). This is only possible if you are having high CPU usage and/or some other integration/addon is slowing HA down enough to cause this.

@harvindhillon I don't believe your issue is related to this one. Your log is littered with MAC_CHANNEL_ACCESS_FAILURE (the radio refusing to transmit because it's too noisy). Similarly:

2024-01-15 21:53:08.918 WARNING (MainThread) [zigpy.application] Zigbee channel 11 utilization is 88.24%! 2024-01-15 21:53:08.918 WARNING (MainThread) [zigpy.application] If you are having problems joining new devices, are missing sensor updates, or have issues keeping devices joined, ensure your coordinator is away from interference sources such as USB 3.0 devices, SSDs, WiFi routers, etc.

puddly avatar Jan 16 '24 18:01 puddly

Thanks, @puddly. To your hardware question:

I'm using the built-in Silicon Labs radio on an HA Yellow with a 4GB CM4 RPi driving it. It's in the normal Zigbee-only mode (not multiprotocol). My processor usage hovers around 5%-10%, so it's not like the system is over loaded (although I'm not sure how many things are bound by single core speed that those multi-core usage percentages may not reflect.)

Screenshot_20240116-115940

I attached logs above and in the previous iteration of this ticket. I have reverted back to 2023.11.3 which was the last stable version of ZHA prior to this run of issues that started in 2023.12. I can give the latest brain a try again later this week if you need more logs. I did update the radio firmware recently, and haven't tested that against the latest releases yet.

asayler avatar Jan 16 '24 19:01 asayler