core icon indicating copy to clipboard operation
core copied to clipboard

ZHA fails initializing after some time

Open ErikSabel opened this issue 1 year ago • 13 comments

The problem

ZHA fails after some time, between the same day or after a couple of days (2 or 3 days) It stays in a inizializing loop, what is visually seen. The only way to get it out is the turn it off, and back on. And a complete reboot is required to get it working, not only HA but also the system where it is installed on Not sure when it started, but at least a couple of weeks ago. Tried to swap the stick between USB2 and 3, but doesnt influence the effect

What version of Home Assistant Core has the issue?

core-2024.8.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

https://we.tl/t-kClFfF43QS

The logging file is to big, so i created a wetransfer link

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

ErikSabel avatar Aug 28 '24 06:08 ErikSabel

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

home-assistant[bot] avatar Aug 28 '24 06:08 home-assistant[bot]

Same here. SLZB-06m (EZSP) Zigbee2Ethernet coordinator

3dtrix avatar Aug 30 '24 06:08 3dtrix

Hey there Samé here. I need to reboot HA instance to trigger. It is quite annoying as I power outlets with ZigBee and it does fall intermittently.. I have ZigBee Xiaomi gateway connected to power plugs ..any suggestion pls ?

kjurcik avatar Sep 02 '24 07:09 kjurcik

I've got some better debug data; home-assistant_zha_2024-09-04T10-52-27.055Z.log

ErikSabel avatar Sep 04 '24 10:09 ErikSabel

Temporary solvable by creating a automation that reboots HA every day. Only the single once that happens within a day isn't solved by this

Neverminder, this doesn't solve it temporary. Thought it would but after last night's reset again it failed 38 minutes after. So there is something that trigger this and it isn't time related

ErikSabel avatar Sep 07 '24 11:09 ErikSabel

Temporary solution found! Disable thread in settings and it keeps working for 3 days already, where a reset was required daily

ErikSabel avatar Sep 13 '24 05:09 ErikSabel

Temporary solution found! Disable thread in settings and it keeps working for 3 days already, where a reset was required daily

Hi, which setting do you mean? Can't spot anything obvious in the UI...

GrumpyMetalGuy avatar Sep 18 '24 16:09 GrumpyMetalGuy

Temporary solution found! Disable thread in settings and it keeps working for 3 days already, where a reset was required daily

Hi, which setting do you mean? Can't spot anything obvious in the UI...

Settings - hardware - configure Still works for me for more than a week

ErikSabel avatar Sep 18 '24 17:09 ErikSabel

Settings - hardware - configure

I couldn't find such a setting with Raspberry Pi. Are you using Green or Yellow? I don't think we even have thread support.

upsuper avatar Sep 21 '24 06:09 upsuper

Another thing I noticed is that every time when ZHA start failing, my command line sensors also fail intermittently for a period, though they restore thereafter.

upsuper avatar Sep 21 '24 06:09 upsuper

Settings - hardware - configure

I couldn't find such a setting with Raspberry Pi. Are you using Green or Yellow? I don't think we even have thread support.

I'm on an raspberry pi, it's not at ZHA but at hardware of the device itself

ErikSabel avatar Sep 21 '24 07:09 ErikSabel

I'm on an raspberry pi, it's not at ZHA but at hardware of the device itself

I still can't find anything. Do you have any screenshot of what you are talking about?

upsuper avatar Sep 21 '24 09:09 upsuper

Don't` know if this is relevant to the devs, but it's gone again for me - the following was in the system logs:

Logger: homeassistant
Source: /usr/src/homeassistant/homeassistant/runner.py:147
First occurred: 15:42:32 (5 occurrences)
Last logged: 15:42:33

Error doing job: Task exception was never retrieved (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 1104, in request_callback_rsp
    await self.request(request, timeout=timeout, **response_params)
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 1065, in request
    response = await response_future
               ^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 1103, in request_callback_rsp
    async with async_timeout.timeout(timeout):
  File "/usr/local/lib/python3.12/site-packages/async_timeout/__init__.py", line 141, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.12/site-packages/async_timeout/__init__.py", line 228, in _do_exit
    raise asyncio.TimeoutError
TimeoutError

GrumpyMetalGuy avatar Sep 21 '24 15:09 GrumpyMetalGuy

This happened again for me, this time I managed to get a debug log. Have had to snip a bit out of the middle to make it small enough to upload. The most recent Python exception is:

2024-09-26 14:49:51.536 DEBUG (MainThread) [homeassistant.components.zha] Failed to set up ZHA Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 695, in _skip_bootloader result = await responses.get() ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/queues.py", line 158, in get await getter asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zha/init.py", line 151, in async_setup_entry await zha_gateway.async_initialize() File "/usr/local/lib/python3.12/site-packages/zha/application/gateway.py", line 272, in async_initialize await self._async_initialize() File "/usr/local/lib/python3.12/site-packages/zha/application/gateway.py", line 255, in _async_initialize await self.application_controller.startup(auto_form=True) File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 234, in startup await self.connect() File "/usr/local/lib/python3.12/site-packages/zigpy_znp/zigbee/application.py", line 103, in connect await znp.connect() File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 738, in connect self.capabilities = (await self._skip_bootloader()).Capabilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 694, in _skip_bootloader async with async_timeout.timeout(CONNECT_PROBE_TIMEOUT): File "/usr/local/lib/python3.12/site-packages/async_timeout/init.py", line 141, in aexit self._do_exit(exc_type) File "/usr/local/lib/python3.12/site-packages/async_timeout/init.py", line 228, in _do_exit raise asyncio.TimeoutError TimeoutError

home-assistant_zha_2024-09-26T13-51-21.627Z.log.gz

GrumpyMetalGuy avatar Sep 26 '24 15:09 GrumpyMetalGuy

@GrumpyMetalGuy From the log, your radio outright stops responding to serial traffic at some point. ZHA keeps trying to communicate with the radio and re-connect and reset it in any way possible but the radio isn't sending anything back.

2024-09-26 05:06:42.535 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
# Radio does not respond, it's dead. ZHA then sends a reset and hopes it works
2024-09-26 05:06:57.573 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.ResetReq.Req(Type=<ResetType.Soft: 1>)
2024-09-26 05:06:57.573 DEBUG (MainThread) [zigpy_znp.api] Request has no response, not waiting for one.
2024-09-26 05:06:57.574 DEBUG (MainThread) [zigpy_znp.uart] Closing serial port

# Retry the connection
2024-09-26 05:07:03.297 DEBUG (MainThread) [zigpy_znp.uart] Connecting to /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 at 115200 baud
2024-09-26 05:07:03.310 DEBUG (MainThread) [zigpy_znp.uart] Opened /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 serial port
2024-09-26 05:07:03.311 DEBUG (MainThread) [zigpy_znp.uart] Connected to /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 at 115200 baud
2024-09-26 05:07:03.312 DEBUG (MainThread) [zigpy_znp.api] Toggling RTS/DTR pins to skip bootloader or reset chip
2024-09-26 05:07:03.313 DEBUG (MainThread) [zigpy_znp.uart] Setting serial pin states: DTR=False, RTS=False
2024-09-26 05:07:03.467 DEBUG (MainThread) [zigpy_znp.uart] Setting serial pin states: DTR=False, RTS=True
2024-09-26 05:07:03.620 DEBUG (MainThread) [zigpy_znp.uart] Setting serial pin states: DTR=False, RTS=False
2024-09-26 05:07:03.775 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:04.276 DEBUG (MainThread) [zigpy_znp.api] Sending CC253x bootloader skip bytes
2024-09-26 05:07:06.780 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:07.781 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:08.782 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:09.783 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:10.784 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:11.786 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:12.787 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()

# No response
2024-09-26 05:07:13.313 DEBUG (MainThread) [zigpy_znp.api] Connection to /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 failed, cleaning up
2024-09-26 05:07:13.314 DEBUG (MainThread) [zigpy_znp.uart] Closing serial port

# And again...
2024-09-26 05:07:22.579 DEBUG (MainThread) [zigpy_znp.uart] Connecting to /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 at 115200 baud
2024-09-26 05:07:22.591 DEBUG (MainThread) [zigpy_znp.uart] Opened /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 serial port
2024-09-26 05:07:22.592 DEBUG (MainThread) [zigpy_znp.uart] Connected to /dev/serial/by-id/usb-1a86_USB_Serial-if00-port0 at 115200 baud
2024-09-26 05:07:22.593 DEBUG (MainThread) [zigpy_znp.api] Toggling RTS/DTR pins to skip bootloader or reset chip
2024-09-26 05:07:22.593 DEBUG (MainThread) [zigpy_znp.uart] Setting serial pin states: DTR=False, RTS=False
2024-09-26 05:07:22.749 DEBUG (MainThread) [zigpy_znp.uart] Setting serial pin states: DTR=False, RTS=True
2024-09-26 05:07:22.902 DEBUG (MainThread) [zigpy_znp.uart] Setting serial pin states: DTR=False, RTS=False
2024-09-26 05:07:23.055 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:23.556 DEBUG (MainThread) [zigpy_znp.api] Sending CC253x bootloader skip bytes
2024-09-26 05:07:26.060 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:27.062 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:28.063 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:29.064 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:30.064 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:31.066 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2024-09-26 05:07:32.067 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()

puddly avatar Sep 26 '24 15:09 puddly

Thanks @puddly . When I reset the box physically, it's all fine for a random amount of time, so it's confusing as to what the cause might be. Would the Zigbee dongle be potentially faulty?

GrumpyMetalGuy avatar Sep 26 '24 15:09 GrumpyMetalGuy

Seeing this issue on an HA Yellow with the built-in Zigbee hardware. Once every few days or so, ZHA goes into a state where it keep re-initializing. Rebooting the full HA Yellow box seems to restore it in most cases, at least for a bit. Logs attached. I've been seeing this with various versions going back at least a few months now, although it seems to have gotten more frequent in the most recent release. Currently on core 2024.9.3 with Supervisor 2024.09.1 and Operating System 13.1. The hybrid threads mode is disabled.

home-assistant_2024-10-03T05-43-05.048Z.log

asayler avatar Oct 03 '24 05:10 asayler

Seems related to #124735.

asayler avatar Oct 03 '24 05:10 asayler

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.