core
core copied to clipboard
ZHA Disconnects from devices frequently and needs to e reset
The problem
Several times a day zigbee will stop controlling devices and need to be reset to resume functionality. This issue seems to have appeared with the most recent OS update
What version of Home Assistant Core has the issue?
core-2023.12.3
What was the last working version of Home Assistant Core?
core-2023.11
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Zigbee
Link to integration documentation on our website
https://www.home-assistant.io/integrations/zha
Diagnostics information
config_entry-zha-81b947f1e0f590e0cd175fead6f19fde.json.txt
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response
Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha
) you are listed as a code owner for? Thanks!
Code owner commands
Code owners of zha
can trigger bot actions by commenting:
-
@home-assistant close
Closes the issue. -
@home-assistant rename Awesome new title
Renames the issue. -
@home-assistant reopen
Reopen the issue. -
@home-assistant unassign zha
Removes the current integration label and assignees on the issue, add the integration domain after the command. -
@home-assistant add-label needs-more-information
Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. -
@home-assistant remove-label needs-more-information
Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.
(message by CodeOwnersMention)
zha documentation zha source (message by IssueLinks)
Can you enable debug logging for the ZHA integration, let it run until it crashes and you have to reload, and then upload the log? You will probably have to compress it first.
home-assistant_zha_2023-12-22T05-41-39.977Z.log.zip
Last 24 hours of debug logs, I dont know exactly when it failed but it did at least 2 times
Hi,
Seems to be related to https://github.com/home-assistant/core/issues/105445 and https://github.com/home-assistant/core/issues/105449.
In my case, updating to 2023.12.3 seems to have solved the problem... (which appeared with one of the 2023.11 versions)
I have the same problem of re-initializing over and over again and loosing devices while this reinitialization-loop repeats.
As mentioned it might be connected to #105445 #105449 and #105506
My logs show a watchdog-timeout:
`2023-12-22 12:47:04.192 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: TimeoutError() 2023-12-22 12:47:04.194 WARNING (MainThread) [zigpy.application] Watchdog failure Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 68, in command return await future ^^^^^^^^^^^^ asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 661, in _watchdog_loop
await self._watchdog_feed()
File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 893, in _watchdog_feed
(res,) = await self._ezsp.readCounters()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 67, in command
async with asyncio_timeout(EZSP_CMD_TIMEOUT):
File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in aexit
raise TimeoutError from exc_val
TimeoutError
2023-12-22 12:47:04.281 DEBUG (MainThread) [zigpy.application] Connection to the radio has been lost: TimeoutError()
2023-12-22 12:47:04.285 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Connection to the radio was lost: TimeoutError()
2023-12-22 12:47:04.285 DEBUG (MainThread) [zigpy.application] Stopping watchdog loop
2023-12-22 12:47:04.286 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Shutting down ZHA ControllerApplication
2023-12-22 12:47:04.291 DEBUG (Thread-209) [aiosqlite] executing functools.partial(<function PersistingListener._set_isolation_level.
I am having the same issue since 2023.12.3 Zigbee keeps breaking down and not recovering on it's own.
When mine fails, the integration does not show an error, it just doesnt work. I have noticed that other people have been having issues with ZHA but they tend to have errors thrown back in the GUI
Hey, Exactly the same problems from version 2023.12.0. This is starting to seriously annoy me...
For me it appears that the issue gets triggered if I use Matter/Thread functionality. If I don't, then rhe initialisation loop doesn't seem to happen or at least is more rare.
After the last update i have to keep reloading zigbee for anything to work. Then it stops working after sometime. Then I have to reload
Any news ?
This will be fixed by https://github.com/home-assistant/core/pull/106147, which is currently in dev
and will be included in the next Core bugfix release.
Hello, the problem is still present despite update 2023.12.4 and in addition, some devices no longer work. Please revert to the ZHA version present in core 2023.11.0
ZHA is still reinitializing with 2023.12.4, devices still lost/getting lost.
From the logs:
2023-12-28 07:42:23.111 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: TimeoutError() 2023-12-28 07:42:23.112 WARNING (MainThread) [zigpy.application] Watchdog failure Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 68, in command return await future ^^^^^^^^^^^^ asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 657, in _watchdog_loop await self._watchdog_feed() File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 893, in _watchdog_feed (res,) = await self._ezsp.readCounters() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/init.py", line 215, in _command return await self._protocol.command(name, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 67, in command async with asyncio_timeout(EZSP_CMD_TIMEOUT): File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in aexit raise TimeoutError from exc_val TimeoutError 2023-12-28 07:42:23.198 DEBUG (MainThread) [zigpy.application] Connection to the radio has been lost: TimeoutError() 2023-12-28 07:42:23.204 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Connection to the radio was lost: TimeoutError() 2023-12-28 07:42:23.204 DEBUG (MainThread) [zigpy.application] Stopping watchdog loop 2023-12-28 07:42:23.206 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Shutting down ZHA ControllerApplication
Still an issue but not as bad? Devices seem to work but some plugs take time to switch but gives an offline msg. Yet It works but slowly.
Well, if 2023.12.4 doesn't fix it then I guess my decision to move over to Zigbee2mqtt again (although the process is nerve wracking every time) was the right one.
Still present despite the update 2023.12.4, New to HA in the last six months, and questioning that decision.
For Zigbee, nothing works correctly since version 2023.12.0. I carried out tests with zigbee2mqtt and everything works perfectly, the problem is ZHA. I use a Sonoff zigbee 3.0 version P dongle. It is no. longer even possible to detect new hardware
For Zigbee, nothing works correctly since version 2023.12.0. I carried out tests with zigbee2mqtt and everything works perfectly, the problem is ZHA. I use a Sonoff zigbee 3.0 version P dongle. It is no. longer even possible to detect new hardware
Still present despite the update 2023.12.4, New to HA in the last six months, and questioning that decision.
Guys we are trying to figure out what the issue is. These comments aren’t helpful without full debug logs. We understand you are frustrated (we are too). Without the debug output we aren’t going to be able to track this down.
Hello, Here are logs as requested home-assistant_zha_2023-12-28T15-18-46.973Z.log zha-0f825f769f452ed86ff47309e6ffb2b5-_TZ3000_fllyghyj TS0201-6fcb6dc1318aff6c5045dea6ea95e10f.json.txt zha-0f825f769f452ed86ff47309e6ffb2b5-_TZ3000_gvn91tmx TS011F-2cdab9110007a495dfc8a7d686d54907.json.txt
Errors on devices are random. Looks like no more than 3 devices can be controlled.
ZHA is still reinitializing with 2023.12.4, devices still lost/getting lost.
From the logs:
2023-12-28 07:42:23.111 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: TimeoutError() 2023-12-28 07:42:23.112 WARNING (MainThread) [zigpy.application] Watchdog failure Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 68, in command return await future ^^^^^^^^^^^^ asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 657, in _watchdog_loop await self._watchdog_feed() File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 893, in _watchdog_feed (res,) = await self._ezsp.readCounters() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/init.py", line 215, in _command return await self._protocol.command(name, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 67, in command async with asyncio_timeout(EZSP_CMD_TIMEOUT): File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in aexit raise TimeoutError from exc_val TimeoutError 2023-12-28 07:42:23.198 DEBUG (MainThread) [zigpy.application] Connection to the radio has been lost: TimeoutError() 2023-12-28 07:42:23.204 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Connection to the radio was lost: TimeoutError() 2023-12-28 07:42:23.204 DEBUG (MainThread) [zigpy.application] Stopping watchdog loop 2023-12-28 07:42:23.206 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Shutting down ZHA ControllerApplication
Any chance you’d be willing to disable all other integrations (core and custom) and restart? We’re trying to determine if we’re having delay issues in the event loop…
Hello, Following a restart of HA, all my zigbee devices became unavailable. Can you deploy a previous operational version of ZHA so that we can have functional home automation again? Home Assistant became unusable due to ZHA.
ZHA is still reinitializing with 2023.12.4, devices still lost/getting lost. From the logs: 2023-12-28 07:42:23.111 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: TimeoutError() 2023-12-28 07:42:23.112 WARNING (MainThread) [zigpy.application] Watchdog failure Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 68, in command return await future ^^^^^^^^^^^^ asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 657, in _watchdog_loop await self._watchdog_feed() File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 893, in _watchdog_feed (res,) = await self._ezsp.readCounters() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/init.py", line 215, in _command return await self._protocol.command(name, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/protocol.py", line 67, in command async with asyncio_timeout(EZSP_CMD_TIMEOUT): File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in aexit raise TimeoutError from exc_val TimeoutError 2023-12-28 07:42:23.198 DEBUG (MainThread) [zigpy.application] Connection to the radio has been lost: TimeoutError() 2023-12-28 07:42:23.204 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Connection to the radio was lost: TimeoutError() 2023-12-28 07:42:23.204 DEBUG (MainThread) [zigpy.application] Stopping watchdog loop 2023-12-28 07:42:23.206 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Shutting down ZHA ControllerApplication
Any chance you’d be willing to disable all other integrations (core and custom) and restart? We’re trying to determine if we’re having delay issues in the event loop…
In the meantime ALL my ZHA-devices went unavailable (=> it will be a lot of work to reintegrate many of the devices).
ZHA keeps reinitializing. What ever changed with the latest release, made it even worse. I would suggest to roll back to the latest working versions regarding ZHA, and then have a thorough look at the diffs from 2023.11 to 2023.12. Anything else is like seeking the needle in the haystack. Just my two cents. In my experience and humble opinion watchdog and timeout events are serious and nothing for quick workarounds, tests and any kind of patchwork. But of course that is up to you as codeowners.
EDIT: Downgraded to 2023.11 as ZHA & HA became unusable. Many ZHA-devices came back live quick, some seem to be lost in the ZigBee-H(e)aven ;)
EDIT2: GOOD NEWS: Many of my ZHA-devices came back (~70%) - IKEA end-devices (with not-so-good/poor RSSI/LQI?) are not willing yet to do that as well. But I had the same problems with RaspBee/ConBee the years before. Might be an isolated IKEA-Firmware-case.
+1, HA 1-2 times per day loses the ability to communicate with zigbee devices. Needs restart, then everything works again. Have a lot of devices from different vendors, pre ~2023.12 I had a very stable experience with HA for several months (except for the suggested intro raspberry Pi setup trashing the SD card after a few months, killing the system completely)
Had hoped the .4 release would fix it, but no. If I knew how to revert to pre .12 version I would.
- Make a backup
- Install Terminal & SSH add-on
- In the terminal type: ha core update --version 2023.11.3
Disclaimer: I just switched to HA from Node-RED (> 7 years) as my smart home system and I'm declaring me myself as HA-noob.
So there might be much more to think about when updating-downgrading.
Hello, I did a test by installing from scratch a new HA 2023.12.4 platform with ZHA and everything works correctly (adding new devices, checking all added devices). On my normal HA, by resetting ZHA with a new network nothing works. There may be another element or configuration that disrupts the proper functioning of ZHA.
By downgrading to version 2023.11.3, the problems persist, nothing works.
How can ZHA be completely and cleanly reset? By removing the integration, traces of ZHA remain.