Thread + OTBR Integrations start normal (but then start to fail after a while)
The problem
Have tried on stable, beta, and now dev builds. Also tried switching to matter beta to see if that changes anything
Running thread border router from my raspberry pi 5 and accessing from both iPhone & Android device but my SkyConnect ZBT-1 border router seems to be kicked out of its own thread network,
- and then stops showing up on the listed "TREL" services on my home network,
- before eventually getting removed off of the "_meshcop."
- The add-on seems to run for a while after that (albeit listing errors) but even that crashes in the end with the integration giving the 'won't start' error (presumably because of this bug)
I've also gotten all kinds of errors in the debug logs that I've unsuccessfully tried to troubleshoot. Tried all kinds of variations on settings, updates, add-on combos... But Home
What version of Home Assistant Core has the issue?
multiple builds (From diff update channels, i.e. Both Stable + Beta)
What was the last working version of Home Assistant Core?
(Not sure, Can't remember)
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Open Thread Border Router (+Main Thread Integration?)
Link to integration documentation on our website
https://next.home-assistant.io/integrations/thread
Diagnostics information
home-assistant_otbr_2024-12-23T20-40-06.471Z.log.txt
logs.txt
logs-1.txt
logs-2.txt
logs-3.txt
error_log-1.bin.txt
error_log-2.bin.txt
config_entry-thread-3aea5d3c963bbf8127d47a8eff42a298.json
Example YAML snippet
[Default]
Anything in the logs that might be useful for us?
Default: mDNS_Execute: SendResponses didn't send all its responses; will try again in one second
Default: mDNS_Execute: SendResponses didn't send all its responses; will try again in one second
Default: mDNS_Execute: SendResponses didn't send all its responses; will try again in one second
00:00:16.158 [W] DuaManager----: Failed to perform next registration: NotFound
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address)
...
00:00:00.096 [W] P-Netif-------: Failed to process request#2: No such process
Default: mDNSPlatformSendUDP got error 101 (Network is unreachable) sending packet to 224.0.0.251 on interface
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::2033:9fff:fe16:8c08/veth07441ec/4127
Failed to register service Home Assistant OpenThread Border Router #B519._meshcop._udp: Service Not Running
[WARN]-BA------: Result of publish meshcop service Home Assistant OpenThread Border Router #B519._meshcop._udp.local: Invalid state
[ERR ]-MDNS----: Failed to register service 22043b337124b519._trel._udp: Service Not Running
[ERR ]-TrelDns-: Failed to publish TREL service: Invalid state. TREL won't be working.
[12:02:50] INFO: mDNS ended with exit code 256 (signal 6)...
[INFO]-WEB-----: Running 0.3.0-b041fa52-dirty
listenAddr not specified, using default ::
[INFO]-WEB-----: Border router web started on wpan0
[00:02:03] INFO: Enabling NAT64.
00:00:00.347 [W] P-Netif-------: Failed to process request#12: No such process
Done
Done
Done
00:00:00.799 [W] DuaManager----: Failed to perform next registration: NotFound
00:00:00.837 [W] Mle-----------: Failed to process Link Accept: Security
s6-rc: info: service legacy-services successfully started
00:00:00.843 [W] Mle-----------: Failed to process Link Accept: Security
00:00:01.099 [W] Mle-----------: Failed to process Link Accept: Security
00:00:01.108 [W] Mle-----------: Failed to process Link Accept: Security
00:01:24.581 [W] DuaManager----: Failed to perform next registration: NotFound
Default: requestIOEvents called with fd 1024 > FD_SETSIZE 1024.
mdnsd: mDNSPosix.c:2527: requestIOEvents: Assertion `0' failed.
[WARN]-MDNS----: DNSServiceProcessResult failed: Service Not Running (serviceRef = 0x55bc12a3b0)
[WARN]-MDNS----: Need to reconnect to mdnsd
[ERR ]-MDNS----: Failed to register service Home Assistant OpenThread Border Router #B519._meshcop._udp: Service Not Running
[WARN]-BA------: Result of publish meshcop service Home Assistant OpenThread Border Router #B519._meshcop._udp.local: Invalid state
[ERR ]-MDNS----: Failed to register service 22043b337124b519._trel._udp: Service Not Running
[ERR ]-TrelDns-: Failed to publish TREL service: Invalid state. TREL won't be working.
[12:38:44] INFO: �[32mmDNS ended with exit code 256 (signal 6)...�[0m
[12:38:45] INFO: �[32mStarting mDNS Responder...�[0m
Default: mDNSResponder (Engineering Build) (Dec 3 2024 17:53:13) starting
Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::c864:83ff:fe7e:d13b/veth7f93f77/8713
Default: mDNSPlatformSendUDP got error 99
1d.03:22:12.946 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:94, chksum:7ae0, ecn:no, to:8ab35ea07d62db38, sec:no, error:Abort, prio:net, radio:trel
1d.03:22:12.946 [N] MeshForwarder-: src:[fe80:0:0:0:5861:f22f:ccdd:dc30]:19788
1d.03:22:12.946 [N] MeshForwarder-: dst:[fe80:0:0:0:88b3:5ea0:7d62:db38]:19788
1d.03:45:26.660 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:64, chksum:c6e2, ecn:no, to:0xb000, sec:yes, error:Abort, prio:low, radio:trel
1d.03:45:26.660 [N] MeshForwarder-: src:[fd6e:d157:2b4:cdbf:0:ff:fe00:5000]:61631
dst:[fe80:0:0:0:88b3:5ea0:7d62:db38]:19788 1d.01:39:56.832 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:64, chksum:c75f, ecn:no, to:0xb000, sec:yes, error:Abort, prio:low, radio:trel
Failed to get forwarded frame priority, error:NotFound, len:35, src:0x8800, dst:0x9c00, sec:yes
1d.02:05:09.977 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:94, chksum:164c, ecn:no, to:8ab35ea07d62db38, sec:no, error:Abort, prio:net, radio:trel
1d.02:16:55.801 [N] MeshForwarder-: Failed to get forwarded frame priority, error:NotFound, len:35, src:0x8800, dst:0x9c00, sec:yes
1d.02:16:55.924 [N] MeshForwarder-: Failed to get forwarded frame priority, error:NotFound, len:35, src:0x8800, dst:0x9c00, sec:yes
1d.02:35:17.641 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:94, chksum:c629, ecn:no, to:8ab35ea07d62db38, sec:no, error:Abort, prio:net"
07:37:43.473 [N] MeshForwarder-: Dropping rx frag frame, error:Duplicated, len:50, src:0x2800, dst:0xffff, sec:yes, tag:35037, offset:0, dglen:86
Default: mDNSPlatformSendUDP got error 88 (Socket operation on non-socket) sending packet to ff02::fb on interface <<ERROR: %#a used with unsupported type: 161>>/<<NULL>>/-2143420412
Default: mDNSPlatformSendUDP got error 88 (Socket operation on non-socket) sending packet to ff02::fb on interface fe80::2c59:29ff:fef7:b8f7/<<NULL>>/0
Default: mDNSPlatformSendUDP got error 88 (Socket operation on non-socket) sending packet to ff02::fb on interface fe80::2c59:29ff:fef7:b8f7/<<NULL>>/0
Default: mDNSPlatformSendUDP got error 101 (Network is unreachable) sending packet to 224.0.0.251 on interface fd2b:20d4:672:30cd:b6e7:7588:372c:f1c8/wlan0/3
[WARN]-MDNS----: DNSServiceProcessResult failed: Service Not Running (serviceRef = 0x558fc8c230)
[WARN]-MDNS----: Need to reconnect to mdnsd
[ERR ]-MDNS----: Failed to register service Home Assistant OpenThread Border Router #DC30._meshcop._udp: Service Not Running
[WARN]-BA------: Result of publish meshcop service Home Assistant OpenThread Border Router #DC30._meshcop._udp.local: Invalid state
[ERR ]-MDNS----: Failed to register service 5a61f22fccdddc30._trel._udp: Service Not Running
[ERR ]-TrelDns-: Failed to publish TREL service: Invalid state. TREL won't be working.
[02:34:41] INFO: �[32mmDNS ended with exit code 256 (signal 11)...�[0m
Default: mDNSPlatformSendUDP got error 88 (Socket operation on non-socket) sending packet to ff02::fb on interface <<UNSPECIFIED IP ADDRESS>>/<<NULL>>/-2143420412
Additional information
Added more info in the discord & in GitHub about different environments & setups I've tried (Will try to add links for reference in an edit incase it's helpful) but this isn't the first time I've had an issue like this but in the last time it happened, it got resolved in another update.
I know some of the logs may have different update branches & builds to the default stable option, but I tried changing to see if it would give any new information that would be useful. I've got a feeling that multiple bugs/issues are involved in it breaking however, but still think this report is worth filing even if it is messy as I've seen others complain about similar problems across Nabu Casa's Home Assistant Platforms on GitHub (and in the Community Discord.)
"mdns"-related errors seem to be a common thing mentioned even in the matter logs ...but this is a relatively recent issue (even if it did span multiple HASS updates) that seems to have come out of nowhere with no significant changes made pre-hand that I can remember. I do remember it starting after a bunch of updates though.
P.s. I sincerely apologise if I've done this wrong or in the wrong place (I struggle with things like this) but would genuinely appreciate it if whoever sees this could alert/forward to the relevant dev team.
### Tasks
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (otbr) you are listed as a code owner for? Thanks!
Code owner commands
Code owners of otbr can trigger bot actions by commenting:
@home-assistant closeCloses the issue.@home-assistant rename Awesome new titleRenames the issue.@home-assistant reopenReopen the issue.@home-assistant unassign otbrRemoves the current integration label and assignees on the issue, add the integration domain after the command.@home-assistant add-label needs-more-informationAdd a label (needs-more-information, problem in dependency, problem in custom component) to the issue.@home-assistant remove-label needs-more-informationRemove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.
(message by CodeOwnersMention)
otbr documentation otbr source (message by IssueLinks)
Hi @Daasin,
The direct reason your otbr-agent crashed seems to be that it received an unexpected response from mDNSResponder so it failed a sanity check. At the mean time, mDNSResponder seems to be problematic with such logs: Default: mDNSPlatformSendUDP got error 99 (Cannot assign requested address) sending packet to ff02::fb on interface fe80::84c9:63ff:fea7:ee28/veth0382cca/9487. According to my experience, that logging line usually means a network interface has been restarted (e.g. removing all addresses and adding addresses back). mDNSResponder could misbehave in such a scenario.
Also, the error logs regarding MeshForwarder are concerning, which may need further investigation if we can have more detailed logs.
Suggestions:
- It may worth check restarting
mdnsdto see if mDNS error logs will stop appearing. - Raise the log level of
otbr-agent. Not sure ifot-ctltool is available on HomeAssistant. If there is, you can tryot-ctl log level 5and then capture the logs. Note that the log level will be reset afterotbr-agentrestarts.
3d.20:35:31.237 [W] P-InfraNetif--: failed to send ICMPv6 message: Cannot assign requested address 3d.20:35:31.237 [W] RoutingManager: Failed to send RA on infra netif 2: Failed
This usually means that your backbone interface (i.e. the -B option when launching otbr-agent) is down. It may be helpful to check the status of it by ip addr and/or monitor it ip monitor a.
I have a similar issue described here (with logs). But no solution so far.
I have also had this same issue