addons
addons copied to clipboard
Silicon Labs Multiprotocol addon failing intermittently causing HAP, Thread and Zigbee issues
Describe the issue you are experiencing
Every few days to a week, the Silicon Labs Multiprotocol addon will stop communicating and indicate 'resource temporarily unavailable'. yet does not make mention of which resource this is. When this addon stops, it breaks my light integration (Nanoleaf via Thread using HAP) and Zigbee sensors via ZHA.
Restarting the addon doesn't fix the issue, nor does restarting home assistant. Usually requires a complete reboot of the host, and even then sometimes it will repeatedly indicate 'resource temporarily unavailable'.
What type of installation are you running?
Home Assistant OS
Which operating system are you running on?
Home Assistant Operating System
Which add-on are you reporting an issue with?
Silicon Labs Multiprotocol
What is the version of the add-on?
2.3.2
Steps to reproduce the issue
Wish I knew, as it fails whenever it wants to (sometimes 3am in the morning, sometimes 5pm in the afternoon).
System Health information
System Information
version | core-2023.8.4 |
---|---|
installation_type | Home Assistant OS |
dev | false |
hassio | true |
docker | true |
user | root |
virtualenv | false |
python_version | 3.11.4 |
os_name | Linux |
os_version | 6.1.37 |
arch | x86_64 |
timezone | Australia/Adelaide |
config_dir | /config |
Home Assistant Community Store
GitHub API | ok |
---|---|
GitHub Content | ok |
GitHub Web | ok |
GitHub API Calls Remaining | 5000 |
Installed Version | 1.32.1 |
Stage | running |
Available Repositories | 1335 |
Downloaded Repositories | 37 |
AccuWeather
can_reach_server | ok |
---|---|
remaining_requests | 14 |
Home Assistant Cloud
logged_in | false |
---|---|
can_reach_cert_server | ok |
can_reach_cloud_auth | ok |
can_reach_cloud | failed to load: unreachable |
Home Assistant Supervisor
host_os | Home Assistant OS 11.0.dev20230705 |
---|---|
update_channel | beta |
supervisor_version | supervisor-2023.08.3 |
agent_version | 1.5.1 |
docker_version | 23.0.6 |
disk_total | 30.8 GB |
disk_used | 23.3 GB |
healthy | true |
supported | true |
board | ova |
supervisor_api | ok |
version_api | ok |
installed_addons | Samba share (10.0.2), Network UPS Tools (0.12.0), Matter Server (4.9.0), Silicon Labs Multiprotocol (2.3.2), Mosquitto broker (6.2.1), Zigbee2MQTT (1.32.2-1), Custom deps deployment (1.3.3), Home Assistant Google Drive Backup (0.111.1), ESPHome (2023.8.2), Terminal & SSH (9.7.1), Studio Code Server (5.10.1), PS5 MQTT (1.3.1), Whisper (1.0.0), Piper (1.3.2), Advanced SSH & Web Terminal (15.0.7), Silicon Labs Flasher (0.2.0) |
Dashboards
dashboards | 3 |
---|---|
resources | 24 |
views | 11 |
mode | storage |
Recorder
oldest_recorder_run | 16 August 2023 at 08:32 |
---|---|
current_recorder_run | 25 August 2023 at 20:34 |
estimated_db_size | 933.92 MiB |
database_engine | sqlite |
database_version | 3.41.2 |
Anything in the Supervisor logs that might be useful for us?
Nothing relating to these issues.
Anything in the add-on logs that might be useful for us?
Logger: bellows.zigbee.application
Source: /usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py:643
First occurred: 10:49:25 (4040 occurrences)
Last logged: 17:00:02
ControllerApplication reset unsuccessful: ConnectionRefusedError(111, "Connect call failed ('172.30.32.1', 9999)")
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 640, in _reset_controller_loop
await self._reset_controller()
File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 661, in _reset_controller
await self.connect()
File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 133, in connect
self._ezsp = await bellows.ezsp.EZSP.initialize(self.config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 164, in initialize
await ezsp.connect(use_thread=zigpy_config[conf.CONF_USE_THREAD])
File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 181, in connect
self._gw = await bellows.uart.connect(self._config, self, use_thread=use_thread)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 414, in connect
protocol, _ = await _connect(config, application)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 385, in _connect
transport, protocol = await zigpy.serial.create_serial_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zigpy/serial.py", line 31, in create_serial_connection
transport, protocol = await loop.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1085, in create_connection
raise exceptions[0]
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
sock = await self._connect_sock(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 628, in sock_connect
return await fut
^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 668, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)
Additional information
HAOS is hosted in a VM on my Unraid server. My Unraid server is still able to see and interact with the USB device(s) when HAOS fails to. For context:
I have the SkyConnect USB as well as a Sonoff Zigbee 3.0 USB Dongle Plus V2” (model “ZBDongle-E”). I mainly use the Skyconnect for everything, and the Sonoff is a recent purchase. Both are flashed with the latest version of the MultiPAN firmware.
I have tried both the stable and beta version of HAOS, no change to the outcome. Sometimes it works for several days, sometimes it fails > 5 times a day.
Update:
Only way to restore Home Assistant to a functional state was to unplug the SkyConnect from its USB port, replug, then reboot Unraid. Simply turning the VM for HAOS off and on nor just rebooting the host without the replug was not enough.
FWIW, I have a PCIe USB Hub card in the server, however the SkyConnect USB is in the motherboards USB2.0 port, rather than the hub, to try and limit the hub being a cause.
@agners I would like to report some similar experience that may not be entirely related to the same error you are receiving. However, I am having similar symptoms. Recently I added my HA SkyConnect to the Apple Border Router thread network via using the TLC Dataset provided from Nanoleaf. I have my devices all on one mesh network and things work great when they are up. For some reason my SkyConnect will just shut off. I believe it is being overloaded somehow. The only solution to the problem is to unplug and replug the device. Below is what Silicon Labs Multiprotocol addon logs look like.
core_silabs_multiprotocol_2023-09-01T00-59-20.370Z.log
BEFORE CRASH OF SKYCONNECT
AFTER CRASH OF SKYCONNECT
@spartandrew18 it seems that the otbr-agent is continuing to restart in a loop?
Do you maybe have the logs from when things first failed? You can get more logs through the CLI:
ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol
You can also redirect into a location where you have access to so you can upload the logs:
ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol > /config/multiprotocol.log
New Update:
Silicon Labs Multiprotocol addon just crashed:
WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #176 : Property-get::PROP_ENDPOINT_STATE aborted
WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #210 : Could not read endpoint state on the secondary
WARNING in function 'system_send_open_endpoint_ack' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #122 : Failed to acknowledge the open request for endpoint #12. Bad file descriptor
Repeats itself every 100ms, which caused my HA VM to hang in responsiveness. Had to force the VM to reboot from Unraid. The addon attempts to startup after the VM/HAOS reboot, but same thing occurs:
[23:30:56:630555] Info : New client connection using library v4.3.1.0
otbr-agent[212293]: 53d.04:20:11.933 [C] Platform------: CPC endpoint open failed
otbr-agent[212293]: 53d.04:20:11.933 [C] Platform------: mCpcBusSpeed = 115200
otbr-agent[212293]: 53d.04:20:11.933 [C] Platform------: Init() at radio.cpp:121: Failure
close cpc endpoint: Resource temporarily unavailable
[23:30:57] INFO: otbr-agent ended with exit code 1 (signal 0)...
OTBR_FORWARD_INGRESS all opt in * out wpan0 ::/0 -> ::/0
Chain OTBR_FORWARD_INGRESS (0 references)
target prot opt source destination
DROP all anywhere anywhere PKTTYPE = unicast
DROP all anywhere anywhere match-set otbr-ingress-deny-src src
ACCEPT all anywhere anywhere match-set otbr-ingress-allow-dst dst
DROP all anywhere anywhere PKTTYPE = unicast
ACCEPT all anywhere anywhere
otbr-ingress-deny-src
otbr-ingress-deny-src-swap
otbr-ingress-allow-dst
otbr-ingress-allow-dst-swap
OTBR_FORWARD_EGRESS all opt in wpan0 out * ::/0 -> ::/0
Chain OTBR_FORWARD_EGRESS (0 references)
target prot opt source destination
ACCEPT all anywhere anywhere
[23:30:57] INFO: OTBR firewall teardown completed.
[23:30:57:782940] Info : Client disconnected
Thread and Zigbee (as expected) have failed:-
ZHA:
Logger: zigpy.application
Source: /usr/local/lib/python3.11/site-packages/zigpy/application.py:198
First occurred: 23:25:23 (8 occurrences)
Last logged: 23:30:49
Couldn't start application
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 195, in startup
await self.connect()
File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 138, in connect
await ezsp.connect(use_thread=self.config[CONF_USE_THREAD])
File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 179, in connect
self._gw = await bellows.uart.connect(self._config, self, use_thread=use_thread)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 414, in connect
protocol, _ = await _connect(config, application)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 385, in _connect
transport, protocol = await zigpy.serial.create_serial_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zigpy/serial.py", line 36, in create_serial_connection
transport, protocol = await loop.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1085, in create_connection
raise exceptions[0]
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
sock = await self._connect_sock(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 628, in sock_connect
return await fut
^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 668, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)
HomeKit Bridge (for the Thread lights via HAP) do not show any useful logs, however all 30 integrations are in constant reload mode.
Rebooting the VM did not resolve the issue. Unraid can still see the NabuCasa Skyconnect plugged in and responding:
lsusb -d 10c4:ea60 -v
Bus 005 Device 002: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x10c4 Silicon Labs
idProduct 0xea60 CP210x UART Bridge
bcdDevice 1.00
iManufacturer 1 Nabu Casa
iProduct 2 SkyConnect v1.0
iSerial 3 <redacted>
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0020
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Device Status: 0x0000
(Bus Powered)
The fix for this at the moment (this time around anyway) was a full server reboot, but it took two attempts and several manual reboots of the addon to make it work and talk with the USB, which is less than ideal.
For what it's worth, the SkyConnect USB is plugged directly into the motherboards USB port (ASRock Rack X570-2L2T) using the included extension cable that come with the unit, to avoid interference.
Any help with this would be appreciated. Thanks.
@spartandrew18 it seems that the otbr-agent is continuing to restart in a loop?
Do you maybe have the logs from when things first failed? You can get more logs through the CLI:
ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol
You can also redirect into a location where you have access to so you can upload the logs:
ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol > /config/multiprotocol.log
Thanks for the logs. Some interesting snippets:
Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: radio tx timeout
Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: RCP failure detected
Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: Trying to recover (1/100)
Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Wait for response timeout
Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: RCP failure detected
Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Trying to recover (2/100)
Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Wait for response timeout
Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: RCP failure detected
Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Trying to recover (3/100)
Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Wait for response timeout
Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: RCP failure detected
Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Trying to recover (4/100)
Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Wait for response timeout
Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: RCP failure detected
Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Trying to recover (5/100)
Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Wait for response timeout
Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: RCP failure detected
Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Trying to recover (6/100)
Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12
Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE
Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394752] WARNING : In function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12
Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394768] WARNING : In function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE
Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:495051] Info : Client disconnected
It seems the OTBR detects that the radio isn't communicating anymore (it can't receive frames any longer). The CPC daemon similarly detects a "retransmit timeout"
WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12
The OTBR then ends up in a restart loop, presumably for the same reason (the radio stopped responding):
Aug 30 04:32:58 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58] INFO: Starting otbr-agent...
...
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58:732529] Info : New client connection using library v4.3.1.0
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #176 : Property-get::PROP_ENDPOINT_STATE aborted
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #210 : Could not read endpoint state on the secondary
...
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: close cpc endpoint: Success
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:59] INFO: otbr-agent ended with exit code 1 (signal 0)...
The second error in the log looks very similar.
Thanks for the logs. Some interesting snippets:
Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: radio tx timeout Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: RCP failure detected Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: Trying to recover (1/100) Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Wait for response timeout Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: RCP failure detected Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Trying to recover (2/100) Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Wait for response timeout Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: RCP failure detected Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Trying to recover (3/100) Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Wait for response timeout Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: RCP failure detected Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Trying to recover (4/100) Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Wait for response timeout Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: RCP failure detected Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Trying to recover (5/100) Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Wait for response timeout Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: RCP failure detected Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Trying to recover (6/100) Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12 Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394752] WARNING : In function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12 Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394768] WARNING : In function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:495051] Info : Client disconnected
It seems the OTBR detects that the radio isn't communicating anymore (it can't receive frames any longer). The CPC daemon similarly detects a "retransmit timeout"
WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12
The OTBR then ends up in a restart loop, presumably for the same reason (the radio stopped responding):
Aug 30 04:32:58 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58] INFO: Starting otbr-agent... ... Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58:732529] Info : New client connection using library v4.3.1.0 Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #176 : Property-get::PROP_ENDPOINT_STATE aborted Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #210 : Could not read endpoint state on the secondary ... Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: close cpc endpoint: Success Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:59] INFO: otbr-agent ended with exit code 1 (signal 0)...
The second error in the log looks very similar.
Is it something to do with my hardware setup? Or is it possible it can fixed via software?
Further update; new crash, same outcome but slight variation in the logs:
ZHA:-
Logger: bellows.zigbee.application
Source: runner.py:186
First occurred: 10:25:19 (5 occurrences)
Last logged: 10:26:19
Watchdog heartbeat timeout: TimeoutError()
Silicon Labs Multiprotocol:-
[10:54:39] INFO: Starting zigbeed...
[10:54:39] INFO: Setup OTBR firewall...
[10:54:39] INFO: Starting otbr-agent...
otbr-agent[95918]: [NOTE]-AGENT---: Running 0.3.0
otbr-agent[95918]: [NOTE]-AGENT---: Thread version: 1.3.0
otbr-agent[95918]: [NOTE]-AGENT---: Thread interface: wpan0
otbr-agent[95918]: [NOTE]-AGENT---: Radio URL: spinel+cpc://cpcd_0?iid=2&iid-list=0
otbr-agent[95918]: [NOTE]-ILS-----: Infra link selected: enp5s0
otbr-agent[95918]: [INFO]-NCP-----: OpenThread log level changed to 5
otbr-agent[95918]: 50d.15:48:26.792 [C] Platform------: CPC init failed. Ensure radio-url argument has the form 'spinel+cpc://cpcd_0?iid=<1..3>'
otbr-agent[95918]: 50d.15:48:26.792 [C] Platform------: Init() at cpc_interface.cpp:96: Failure
close cpc endpoint: Resource temporarily unavailable
[10:54:41] INFO: otbr-agent ended with exit code 1 (signal 0)...
Specifically the following line:
otbr-agent[95918]: 50d.15:48:26.792 [C] Platform------: CPC init failed. Ensure radio-url argument has the form 'spinel+cpc://cpcd_0?iid=<1..3>
I don't believe have seen this specific error, but everything else is the same.
Restarting the SLM addon shows the following logs:
s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service mdns successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/check-cpcd-shm.sh
[11:20:14] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Aug 23 2023 18:27:14) starting
cont-init: info: /etc/cont-init.d/check-cpcd-shm.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service banner: starting
-----------------------------------------------------------
Add-on: Silicon Labs Multiprotocol
Zigbee and OpenThread multiprotocol add-on
-----------------------------------------------------------
Add-on version: 2.3.2
You are running the latest version of this add-on.
System: Home Assistant OS 11.0.dev20230705 (amd64 / qemux86-64)
Home Assistant Core: 2023.9.0
Home Assistant Supervisor: 2023.08.3
-----------------------------------------------------------
Please, share the above information when looking for help
or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[11:20:15] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[11:20:15] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2023-09-07 11:20:15 homeassistant universal_silabs_flasher.flash[185] INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.3.1', ezsp_version=None, ot_rcp_version=None, fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>, baudrate=460800)
2023-09-07 11:20:15 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2023-09-07 11:20:17 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.CPC at 460800 baud
2023-09-07 11:20:21 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.CPC at 115200 baud
2023-09-07 11:20:26 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.CPC at 230400 baud
2023-09-07 11:20:30 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.EZSP at 115200 baud
2023-09-07 11:20:35 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.SPINEL at 460800 baud
Error: Failed to probe running application type
s6-rc: warning: unable to start service universal-silabs-flasher: command exited 1
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service mdns: stopping
s6-rc: info: service banner: stopping
Default: mDNSResponder (Engineering Build) (Aug 23 2023 18:27:14) stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[11:20:40] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service mdns successfully stopped
The fix now is a physical unplug and replug of the USB into the server, then a full host restart. Simple reboot of the VM did not change the outcome.
It also looks like others are reporting similar issues (#3193) so it is not an isolated incident.
Devs, please advise what further information you require to resolve/investigate this. Physical host restarts to resolve a VM issue are far from ideal (in my case anyway) and we'd like to be able to assist to resolve this problem as soon as practical.
Have some with this problem one ESP module and can try using it as serial port over Ethernet ? It was working before but have not testing latest versions but it shall being fixed if having one separate RCP module connected to the ESP.
If you would like to try experimental firmware for the SkyConnect that should eliminate the need for physical resets (source https://github.com/NabuCasa/silabs-firmware-builder/pull/33), I've attached a pre-compiled copy here: skyconnect_rcp-uart-802154_4.3.1_wd.gbl.zip
(zipped so it can be uploaded to GitHub, unzip it first).
If you have any step-by-step information for how to reproduce this issue, it would be very useful!
(CC @kylev, from #3193)
Not much interesting about my setup (city setting, very busy 2.4Ghz band). I'm on 2023.9.2 with a mix of Wifi (Sengled, Cync, generic), zigbee (misc mfg), and 3 Nanoleaf bulbs.
Flashing with the web tool seemed to work, but the probed GBL metadata seems identical during boot. Should your build appear differently?
NabuCasaMetadata(metadata_version=1, sdk_version='4.3.1', ezsp_version=None, ot_rcp_version=None, fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>, baudrate=460800)
Thanks!
The metadata will be identical, especially when probing. This also makes sure the addon doesn't re-flash the bundled firmware.
If you would like to try experimental firmware for the SkyConnect that should eliminate the need for physical resets (source NabuCasa/silabs-firmware-builder#33), I've attached a pre-compiled copy here:
skyconnect_rcp-uart-802154_4.3.1_wd.gbl.zip
(zipped so it can be uploaded to GitHub, unzip it first).If you have any step-by-step information for how to reproduce this issue, it would be very useful!
(CC @kylev, from #3193)
I've installed the experimental firmware and so far so good. Unfortunately I haven't had the failures again since my last batch of logs, but nothing has changed... so the inconsistencies aren't helping resolve the issue.
As for steps on replicating the issue, there-in lies the problem; I don't actually do anything to trigger the problems, they just happen. I've setup additional logging from the unRAID host in the event the registered USB ID disconnects or does something weird from its perspective, but HA doesn't give any more indication of what caused the failure other than what I provided on original report.
Overnight, the container went into the cpcd
crash-loop with Property-get::PROP_ENDPOINT_STATE aborted
and Property-get::PROP_ENDPOINT_STATE aborted
errors. I restarted the container, and it entered the probing instead of detection stage and failed.
The new firmware didn't seem to have an effect.
Perfect, thank you for the feedback. The watchdog isn't integrated tightly into CPC so it looks like just the CPC part is crashing, not the whole firmware. I'll post an updated one later next week.
Just to ensure clarity: I had to un-plug and re-plug the SkyConnect. The new firmware didn't seem to have an effect.
Whatever initially breaks with the dongle triggers the cpcd
crash loop but leaves the container up. Subsequently restarting the whole container results in the firmware flasher attempting to probe the device at multiple baudrates, failing, and crashing the container (and triggering watchdog restart).
After re-plugging the dongle, the container comes up on the next watchdog restart.
I suspect if you tweak the cpcd finish
script to crash the container we'd see 1 cpcd crash followed by many probe failures.
Seems reproducible: same symptoms this morning (crash loop cpcd, restarting container results in firmware flasher probe failures, replug SkyConnect fixes it).
I've just re-flashed the SkyConnect with your build again, just to ensure I didn't make an error last time. Will comment if anything changes.
Any progress made on the fix?
To everyone having this problem, are you all using Nanoleaf devices? Because in my case, it's been a while since I had to restart the addon + unplug-replug my skyconnect. I don't know if it's pure luck because it's random, if it's the latest Nanoleaf firmware update (3.5.41) or the various HA updates I installed but it's stable a the moment.
Nope. Not a single nanoleaf. Only hue, aqara, ikea and some thread devices.
To everyone having this problem, are you all using Nanoleaf devices? Because in my case, it's been a while since I had to restart the addon + unplug-replug my skyconnect. I don't know if it's pure luck because it's random, if it's the latest Natoleaf firmware update (3.5.41) or the various HA updates I installed but it's stable a the moment.
Unfortunately that firmware only applies to the Matter over Thread Nanoleaf bulbs. The HomeKit over Thread (non-Matter) bulbs latest firmware is 1.6.49.
I've started to add Matter over Thread Nanoleaf bulbs to my mix of things, and out of the box they are 3.2.0, which the Android Nanoleaf indicates has a 'critical firmware upgrade', pointing to 3.5.37 as the FW in question. They all default to 3.5.41 once upgraded though.
it's been a while since I had to restart the addon + unplug-replug my skyconnect.
I spoke too soon, I had to restart/unplug-replug today.
it's been a while since I had to restart the addon + unplug-replug my skyconnect.
I spoke too soon, I had to restart/unplug-replug today.
Mine also failed yesterday as well. I've now split my Skyconnect to do just OTBR work (Sonoff ZBDongle-E to do the Zigbee work), as well as put both USB devices on extension cables to minimise interference.
Skyconnect crashed again about 6 hours after the split duties, and needed a replug to a different USB port to come back to life.
Can you describe the Zigbee and Thread devices you have on your networks? Any Zigbee Green Power?
I've been able to replicate a crash and will try to get a firmware out that possibly mitigates it but there may be multiple concurrent bugs here causing issues.
I have the same crashes and have:
Zigbee (values from ZHA): 1x FYRTUR block-out roller blind (IKEA of Sweden) 1x LCT012 (Hue) 3x LCT015 1x lumi.airmonitor.acn01 (aqara) 2x lumi.curtain.acn002 (aqara) 5x lumi.magnet.acn001 (aqara) 1x lumi.remote.b1acn01 (aqara) 1x lumi.sen_ill.mgl01 (xiaomi) 1x lumi.sensor_cube (aqara) 3x lumi.sensor_motion.aq2 (aqara) 2x lumi.vibration.aq1 (aqara) 6x lumi.weather (aqara) 1x Remote Control N2 (IKEA of Sweden) 1x RWL021 (hue) 4x TRADFRI bulb GU10 CWS 345lm (IKEA of Sweden) 2x TRADFRI on/off switch (IKEA of Sweden) 1x TRADFRI open/close remote (IKEA of Sweden) 1x TRADFRI remote control (IKEA of Sweden) 4x TRADFRI Signal Repeater (IKEA of Sweden) 12x TRADFRIbulbE14WWclear250lm (since 10 months not at power, but paired) (IKEA of Sweden) 1x TRADFRIbulbE27WSglobeopal1055lm (IKEA of Sweden) 1x TRADFRIbulbE27WWclear250lm (IKEA of Sweden)
Thread (via Homekit Controller): 8x Eve Thermo 20EBP1701 (eve systems) -> Takes sometimes hours to reconnect after restart of sky connect
Wifi (via Homekit Controller): 1x Netatmo Main Module 1x Indoor Module 1x Outdoor Module
Think I'll need to close this issue out as my network topology has changed substantially since first logging. I've now split OTBR and Zigbee duties into two devices (OTBR being Skyconnect, Zigbee being a Sonoff ZBDongle-E until my second Skyconnect turns up). Skyconnect is now OpenThreadRCP only, Sonoff is now Zigbee only.
To answer your question though;
Zigbee: 18x Aqara motion sensors 1x Door lock 2x Zigbee magnetic door sensors 5x Zigbee USB Repeaters
Thread: 5x Nanoleaf Matter over Thread bulbs 26x Nanoleaf Homekit over Thread bulbs
I'm in the midst of replacing the HK bulbs with MoT, for various reasons.
I've removed the multiprotocol addon and reconfigured my Zigbee channel to be elsewhere (multiprotocol required the channels to be the same), and reinstalled the standalone OTBR addon in HACS.
@DunklerPhoenix Id the Eve Thermo 20EBP1701
one sleeper / reduced thread device that can only being one child or one full future thread device that can being router / leader ?
I'm not sure if I understand you correctly.
The Eve Thermo is just a child and cant be used as a router or leader. It's battery powered and so I think a reduced thread device. But I'm not sure. sry
https://www.evehome.com/de/eve-thermo
@DunklerPhoenix OK do you have FTD device that can "holding" the network alive (leader) then the RCP is not online ? If not the RCP / addon must being the leader but its taking time for the network being OK after it was offline and getting the child getting back in it.
Ahhh. No they cant hold the network alive. And I dont have any other leader than sky Connect