addons icon indicating copy to clipboard operation
addons copied to clipboard

Android ThreadNetwork SDK unable to find border router with current mdns behaviour

Open miawgogo opened this issue 4 months ago • 20 comments

Describe the issue you are experiencing

Android is unable to find the border router which is leading to weird bugs relating to setting up thread credentials on android as it requires that the border router send certian values in a mDNS response.

This largely seem to be because the response to the PTR query only contains A and AAAA records, no TXT or srv records. However Android can discover networks from border routers that respond to the PTR query with TXT and SRV records. (see screenshots in addition details, i can send pcaps on request)

When OTBR does a unsolicited advertisement it does contain the txt record, which i think is why iOS/iPadOS does not show the same issues as it uses a cached response, but android seems to not want to use its mdns cache and sends new requests for _meshcop._udp.local when searching for and adding/updating credentials

OTBR running on raspbian(the native setup in the OTBR guide) does have the missing TXT record, indicating that its related to the home assistant mDNS setup.

(im being a bit annoying about this as there is ~4 months until cheap matter over thread devices become available from IKEA and use of thread moves out of early adopters and apple users who will have the issues hidden by having multiple border routers)

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

OpenThread Border Router

What is the version of the add-on?

2.13.0

Steps to reproduce the issue

  1. Setup Home assistant with the Open Thread Border Addon (optionally have a second border router running to compare)

  2. Run Wireshark on a device inbetween your android device and home assistant, with a display filter dns.resp.name contains "_meshcop._udp.local" || dns.qry.name contains "_meshcop._udp.local"

  3. Either:

    • Open settings and search for "Thread Networks"
    • Use the sync thread credentials in the trouble shooting section of the companion app
  4. When its loading you should notice a request from your android device for the service and a response from home assistant and your border router(s)

  5. Note that the TXT record from home assistant border router is missing(and that your response from a non-homeassistant border router has a txt record).

    • also note that the android device makes no attempt to resolve the missing TXT and SRV records

System Health information

System Information

version core-2025.8.2
installation_type Home Assistant OS
dev false
hassio true
docker true
container_arch aarch64
user root
virtualenv false
python_version 3.13.3
os_name Linux
os_version 6.12.34-haos-raspi
arch aarch64
timezone Europe/London
config_dir /config
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 16.1
update_channel stable
supervisor_version supervisor-2025.08.1
agent_version 1.7.2
docker_version 28.3.3
disk_total 57.8 GB
disk_used 4.1 GB
healthy true
supported true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization
board rpi3-64
supervisor_api ok
version_api ok
installed_addons Matter Server (8.1.0), OpenThread Border Router (2.13.0)
Dashboards
dashboards 2
resources 0
views 0
mode storage
Network Configuration
adapters lo (disabled), enu1u1 (enabled, default, auto), docker0 (disabled), hassio (disabled), vethbb56d8e (disabled), vethed6acb0 (disabled), vethad5a13b (disabled), veth318196b (disabled), vethe5b435e (disabled), vethc38bbe8 (disabled)
ipv4_addresses lo (127.0.0.1/8), enu1u1 (192.168.1.80/24), docker0 (172.30.232.1/23), hassio (172.30.32.1/23), vethbb56d8e (), vethed6acb0 (), vethad5a13b (), veth318196b (), vethe5b435e (), vethc38bbe8 ()
ipv6_addresses lo (::1/128), enu1u1 (2a02:8010:67ed:1:cae9:6334:907e:d373/64, fded:bd70:e03e:1:6d14:e8a:bc1e:a730/64, fe80::7de0:8dee:9c83:2247/64), docker0 (fe80::781a:c7ff:fe58:3c3/64), hassio (fd0c:ac1e:2100::1/48, fe80::f460:e0ff:fe1d:dac8/64), vethbb56d8e (fe80::90fc:62ff:fe82:8036/64), vethed6acb0 (fe80::3850:f9ff:fee0:c756/64), vethad5a13b (fe80::50be:8aff:fefe:bfb3/64), veth318196b (fe80::8894:d6ff:fe64:3a1b/64), vethe5b435e (fe80::c462:32ff:fecc:b941/64), vethc38bbe8 (fe80::30bc:ceff:fefa:93c4/64)
announce_addresses 192.168.1.80, 2a02:8010:67ed:1:cae9:6334:907e:d373, fded:bd70:e03e:1:6d14:e8a:bc1e:a730, fe80::7de0:8dee:9c83:2247
Recorder
oldest_recorder_run August 21, 2025 at 4:32 PM
current_recorder_run August 21, 2025 at 5:32 PM
estimated_db_size 0.26 MiB
database_engine sqlite
database_version 3.48.0

Anything in the Supervisor logs that might be useful for us?


Anything in the add-on logs that might be useful for us?


Additional information

~~I think this is related to this android issue as androids thread SDK requires that the TXT record be present to save new credentials~~ After testing this is unrelated, but having working mdns does reduce waiting times for the key sync

Note: The Border Agent ID MUST be the id TXT value of the Border Router mDNS MeshCoP service.

and

Note: It's important for the Border Router device to include a 16-byte id TXT value in its MeshCoP mDNS service for Google Play services to discover and identify the Thread network. For products using the open source ot-br-posix, make sure that the OTBR_PUBLISH_MESHCOP_BA_ID feature is enabled.

Note: the "Native OTBR" referenced bellow was built from the ot-br-postix repo's main branch on the day this issue was created, it contains significant changes to the mdns subsystem. I have not been able to build on at the same commit as the addon for testing

Home assistant's Response Image
Nanoleafs Response Image
Native OTBR Response Image
Native OTBR showing in thread settings Image

miawgogo avatar Aug 21 '25 16:08 miawgogo

this happens on both my main instance and the test pi i setup for this issue

miawgogo avatar Aug 21 '25 17:08 miawgogo

It is also likely related to this issue that got lost in the stale bot soup https://github.com/home-assistant/addons/issues/4035

miawgogo avatar Aug 21 '25 17:08 miawgogo

After talking with kepstin, This is ~~possibly interacting with a implementation bug with androids mDNS client, It must in cases where it doesnt get the additional records in a ptr, request the subsequent A, AAAA, SRV, and TXT records (depending on whats missing) https://www.rfc-editor.org/rfc/rfc6763#section-12~~ kepstin found that androids mDNS client should request further records if the srv and txt records are not in the ptr response, but its not doing it when adding Thread Credentials or when searching for Border agents

My steps should account for this as its looking on the wire directly, but if you have annother border router in your network that is not the addon, android will find the network that way

miawgogo avatar Aug 21 '25 19:08 miawgogo

I can confirm that on my system (Home Assistant virtual machine, using an Intel SR-IOV virtual NIC), the Android settings Thread panel is unable to discover the Home Assistant OTBR.

However, the Nanoleaf app running on the same phone is able to discover the Home Assistant OTBR, and so can an MDNS browser test app (I suspect that both of these apps are performing the mDNS queries themselves, rather than relying on system discovery).

kepstin avatar Aug 21 '25 20:08 kepstin

I also have a nano leaf border router and found that it was doing subsequent any requests for the Home Assistant OpenThread Border Router

Standard query 0x0003 PTR _meshcop._udp.local, "QM" question ANY Home Assistant OpenThread Border Router #BE4A._meshcop._udp.local, "QM" question ANY Home Assistant OpenThread Border Router #9B09._meshcop._udp.local, "QM" question

This is probably either because they are doing MDNS requests themselves and can add the follow up requests, or they are using the system mdns client which will follow up in exactly the same way, this is the behavior as of android 14 and the ThreadNetwork sdk is not using the system mdns client

miawgogo avatar Aug 22 '25 13:08 miawgogo

I have bumped the thread border router to the latest(at the time of writing) commit as the new in-built mdns resolver, which does resolve android not being able to find the device(ive also noticed that it speeds up the key sync and device commissioning if the txt record is included)

Screenshot_20250824-183918_Google Play services.png

miawgogo avatar Aug 25 '25 16:08 miawgogo

semi related as its connected to the version bump containing 1.4 changes, there is likely some larger architectural changes needed to perform DHCPv6 prefix delegation to enable access to ipv6 internet connections.

originally i thought it might mean replacing NetworkManager, but its likely just replacing the dhcp backend with dhcpcd and then implementing a a option where the border router addon can enable a configuration to get the OS' dhcp client to request a prefix and then save it in the network configuration so that the radvd Daemon can be configured to share that prefix with the mesh.

miawgogo avatar Aug 27 '25 13:08 miawgogo

I just want to add that I absolutely do not want IoT devices to get internet access. I isolate them as much as I can and want to be able to keep doing that.

At least make it possible to do so.

Gunni avatar Aug 27 '25 13:08 Gunni

I just want to add that I absolutely do not want IoT devices to get internet access. I isolate them as much as I can and wane to be able to keep doing that.

At least make it possible to do so.

oh no, thats fair, the spec does say it should be user configurable, when i was fiddling to get it working i had set it up to only configure it when it was enabled

miawgogo avatar Aug 27 '25 13:08 miawgogo

and you can still configure your firewall to deny access to and from the thread prefix too

miawgogo avatar Aug 27 '25 13:08 miawgogo

As a side note - my ipad also does not show any Thread networks (tried from the Eve app). Maybe that's the same issue? On Android I also have exactly the same thing.

Just wondering - is there something we can do to support you @miawgogo ?

klassm avatar Aug 27 '25 16:08 klassm

not much, it's largely waiting to hear from the addons maintainer as im not confident if the latest openthread border router code is ready for use.

On the eve app, the home assistant border router doesn't show up for the addon i have in miawgogo/addon-test so i think it may be a few things

miawgogo avatar Aug 27 '25 16:08 miawgogo

actually, the eve app doesn't show any border routers on my network, it also doesn't show my nanoleaf router.

I wonder if apple have some form of certification like mfi for border routers here

miawgogo avatar Aug 27 '25 16:08 miawgogo

semi related as its connected to the version bump containing 1.4 changes, there is likely some larger architectural changes needed to perform DHCPv6 prefix delegation to enable access to ipv6 internet connections.

originally i thought it might mean replacing NetworkManager, but its likely just replacing the dhcp backend with dhcpcd and then implementing a a option where the border router addon can enable a configuration to get the OS' dhcp client to request a prefix and then save it in the network configuration so that the radvd Daemon can be configured to share that prefix with the mesh.

It feels pretty unclear on how the DHCP-PD prefix configuration works, and i think it might not work separated out like it would be in Home Assistant OS. So it may not be needed to have RADVD running.

Although i had success in getting a GUA onto the mesh using ot-ctl prefix add, after which my devkit could successfully ping a address on the internet(openthread.io). So that may work instead, having a service in the container that watches to see if there is a new prefix available and configure it on the border router(Adding and removing it)

miawgogo avatar Aug 27 '25 22:08 miawgogo

Maybe @puddly @agners or @frenck can have a look? I know mentioning people is not so nice, but this seems to be something that's affecting quite some people, including various issues also in Home Assistant core -.-

klassm avatar Aug 31 '25 14:08 klassm

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 30 '25 15:09 github-actions[bot]

AFAIK this hasn't been addressed.

Gunni avatar Sep 30 '25 15:09 Gunni

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 30 '25 17:10 github-actions[bot]

still a issue, just nothing to add

god this bot is annoying

miawgogo avatar Oct 31 '25 07:10 miawgogo

Thanks for the detailed issue report and analysis. FWIW, I can reproduce the issue on my end, too.

I have bumped the thread border router to the latest(at the time of writing) commit as the new in-built mdns resolver, which does resolve android not being able to find the device(ive also noticed that it speeds up the key sync and device commissioning if the txt record is included)

Yeah it make sense its related to the OTBR shipped mDNS implementation: We use the reference implementation mDNSResponder from Apple, so it seems to behave that way by default. Good to hear that a newer version resolved that particular problem.

kepstin found that androids mDNS client should request further records if the srv and txt records are not in the ptr response, but its not doing it when adding Thread Credentials or when searching for Border agents

Should but doesn't? So that makes it sound as if Android is relying on a specific behavior of mDNS resolver (including non-requested extra TXT and SRV records)?

semi related as its connected to the version bump containing 1.4 changes, there is likely some larger architectural changes needed to perform DHCPv6 prefix delegation to enable access to ipv6 internet connections.

Yeah that is holding back bumping OTBR a bit on our end. Some of the 1.4 changes need a bit more work, I am hesitant to simply push out a bump. We've previously decided against jumping on 1.4, see https://github.com/home-assistant/addons/pull/3808#issuecomment-2434912582.

originally i thought it might mean replacing NetworkManager, but its likely just replacing the dhcp backend with dhcpcd and then implementing a a option where the border router addon can enable a configuration to get the OS' dhcp client to request a prefix and then save it in the network configuration so that the radvd Daemon can be configured to share that prefix with the mesh.

I hope that we can have the OTBR add-on listen for prefix delegations on it's own (however that may look like exactly, maybe run its own copy of DHCPv6 client?). If we need to have the system networking system to delegate prefixes to the add-on, it would need some additional wiring up 😰

Anyhow, the bump of OTBR is currently a bit held back due to these Thread 1.4 uncertainties. Maybe we could bump OTBR without promoting Thread 1.4 as a first step, to get a new mDNSResponder shipped 🤔 .

agners avatar Nov 03 '25 13:11 agners