mdns-sd
mdns-sd copied to clipboard
Disprecancy between mdns-sd and Bonjour
I am currently investigating a discrepancy between mdns-sd and Bonjour and I cannot decide which lib is right.
Steps to repro:
- Launch
cargo run --example query _adb-tls-pairing._tcp. - On an Android device, go to "Wireless debugging", click on "Pair device with pairing code".
- Click "Cancel".
- Click again on "Pair device with pairing code".
Up to step 3 (included), both Bonjour and mdns-sd discover the service properly created and then removed (I use "Discovery" app on my mac laptop to monitor creation/removal).
In step 4, Discovery correctly see the service being created again. However mdns-sd seems to remove the service as soon as it is added.
At 28.44183725s: ServiceFound("_adb-tls-pairing._tcp.local.", "adb-43081FDAS000ST-bAND0a._adb-tls-pairing._tcp.local.")
At 28.442270375s: Resolved a new service: adb-43081FDAS000ST-bAND0a._adb-tls-pairing._tcp.local.
host: Android_M83JPE5T.local.
port: 33763
Address: fe80::9070:62ff:fe95:b6f9
Address: 192.168.1.2
At 29.443522292s: ServiceRemoved("_adb-tls-pairing._tcp.local.", "adb-43081FDAS000ST-bAND0a._adb-tls-pairing._tcp.local.")
Here are the pcap traces if that can help. 192.168.0.3 is the Android device, whereas 192.168.0.2 is the laptop.
From your steps, looks like mdns-sd should not remove the service after step 4. It's not clear to me why: in both pcap trace files I didn't see a packet from Android to gracefully remove the record. Is it possible to run mdns-sd query with debug ? (i.e. like:
RUST_LOG=debug cargo run --example query _adb-tls-pairing._tcp or
RUST_LOG=trace cargo run --example query _adb-tls-pairing._tcp
)
Here are the pcap traces if that can help. 192.168.0.3 is the Android device, whereas 192.168.0.2 is the laptop.
I'm a bit confused: the log shows the Android device IP is 192.168.1.2. The pcap trace also shows its 192.168.1.2. I don't see 192.168.0.3 anywhere.
Here is the trace.
I can see all the record expiring before the removed notification.
[2025-04-14T12:08:26.157Z TRACE mdns_sd::dns_cache] expired PTR: domain:1.E.D.7.4.E.4.7.2.5.A.4.E.6.A.2.0.0.F.6.1.0.3.1.4.1.B.C.1.0.A.2.ip6.arpa. record: DnsPointer { record: DnsRecord { entry: DnsEntry { name: "1.E.D.7.4.E.4.7.2.5.A.4.E.6.A.2.0.0.F.6.1.0.3.1.4.1.B.C.1.0.A.2.ip6.arpa.", ty: PTR, class: 1, cache_flush: true }, ttl: 120, created: 1744632490721, expires: 1744632506156, refresh: 1744632586721, new_name: None }, alias: "Android_A0WCYA4K.local." }
[2025-04-14T12:08:26.157Z TRACE mdns_sd::dns_cache] expired PTR: domain:0.3.E.2.5.5.E.F.F.F.A.C.5.0.4.1.0.0.0.0.0.0.0.0.0.0.0.0.0.8.E.F.ip6.arpa. record: DnsPointer { record: DnsRecord { entry: DnsEntry { name: "0.3.E.2.5.5.E.F.F.F.A.C.5.0.4.1.0.0.0.0.0.0.0.0.0.0.0.0.0.8.E.F.ip6.arpa.", ty: PTR, class: 1, cache_flush: true }, ttl: 120, created: 1744632490721, expires: 1744632506156, refresh: 1744632586721, new_name: None }, alias: "Android_A0WCYA4K.local." }
[2025-04-14T12:08:26.157Z TRACE mdns_sd::dns_cache] expired PTR: domain:0.3.E.2.5.5.E.F.F.F.A.C.5.0.4.1.0.0.F.6.1.0.3.1.4.1.B.C.1.0.A.2.ip6.arpa. record: DnsPointer { record: DnsRecord { entry: DnsEntry { name: "0.3.E.2.5.5.E.F.F.F.A.C.5.0.4.1.0.0.F.6.1.0.3.1.4.1.B.C.1.0.A.2.ip6.arpa.", ty: PTR, class: 1, cache_flush: true }, ttl: 120, created: 1744632490721, expires: 1744632506156, refresh: 1744632586721, new_name: None }, alias: "Android_A0WCYA4K.local." }
[2025-04-14T12:08:26.157Z TRACE mdns_sd::dns_cache] expired SRV: _adb-tls-pairing._tcp.local.: DnsSrv { record: DnsRecord { entry: DnsEntry { name: "adb-43081FDAS000VS-qZDEu8._adb-tls-pairing._tcp.local.", ty: SRV, class: 1, cache_flush: true }, ttl: 120, created: 1744632490721, expires: 1744632506156, refresh: 1744632586721, new_name: None }, priority: 0, weight: 0, host: "Android_A0WCYA4K.local.", port: 39649 }
[2025-04-14T12:08:26.157Z TRACE mdns_sd::dns_cache] expired PTR: domain:25.1.168.192.in-addr.arpa. record: DnsPointer { record: DnsRecord { entry: DnsEntry { name: "25.1.168.192.in-addr.arpa.", ty: PTR, class: 1, cache_flush: true }, ttl: 120, created: 1744632490721, expires: 1744632506156, refresh: 1744632586721, new_name: None }, alias: "Android_A0WCYA4K.local." }
[2025-04-14T12:08:26.158Z DEBUG mdns_sd::service_daemon] notify_service_removal: sent ServiceRemoved to listener of _adb-tls-pairing._tcp.local.: adb-43081FDAS000VS-qZDEu8._adb-tls-pairing._tcp.local.
At 29.41300875s: ServiceRemoved("_adb-tls-pairing._tcp.local.", "adb-43081FDAS000VS-qZDEu8._adb-tls-pairing._tcp.local.")
Sorry about my late response. Thank you for the trace file! It's really helpful. I think I found the bug and submitted a PR #350 for fixing it. Would you be able to try out the PR branch to see if it helps? Thanks!
I tried with the PR and this patch is fixing the problem.
Could you help me understand what is happening? I would like to report the issue to Android Wifi team. I tested several mDNS client (mdns-sd, avahi, Bonjour, and openscreen) and found that several frameworks are affected by this issue.
Is the problem that the second publishing is using a different SVR name?
I tried with the PR and this patch is fixing the problem.
👍 thanks for verifying!
Could you help me understand what is happening?
In our case, what happened is:
- When SRV refreshes with a different
host(see in the log), we expire the old SRV record (per RFC, expire in 1 second), and add a new SRV record under the same service instance name (i.e. SRVnamekeeps the same). - The problem is: when the old SRV expires at the end of 1 second, we are removing not only the old SRV, but the list of all SRV records of the service instance, including the new SRV record. This triggered the service instance itself got removed.
Hence, the fix is that we only remove the old SRV record and keep the rest of SRV list for a given name.
Is the problem that the second publishing is using a different SVR name?
The SRV name is still the same (which the key of SRV hashmap), but a different host.
Thank you for being specific about the nomenclature :) !
per RFC, expire in 1 second
For future reference, that is "10.1. Goodbye Packets"
The problem is: when the old SRV expires at the end of 1 second, we are removing not only the old SRV, but the list of all SRV records of the service instance, including the new SRV record. This triggered the service instance itself got removed.
I don't understand this explanation. The repro steps mentions publishing (P1), expiring (E), wait 5s (sorry I was not more specific about how long to wait in my initial report), then publish again (P2). If P2 and P1 have the same RR, that means something was not properly cleaned during E right? I don't see how the 1s expire can affect P2.
I don't understand this explanation. The repro steps mentions publishing (
P1), expiring (E), wait 5s (sorry I was not more specific about how long to wait in my initial report), then publish again (P2). IfP2andP1have the sameRR, that means something was not properly cleaned duringEright? I don't see how the 1s expire can affect P2.
P2 and P1 do not have the same RR in this case. In P2, the host changed from Android_M83JPE5T.local. to Android_A0WCYA4K.local.. This update flushes the old record by setting it expire in 1 second, and also added a new SRV record in the same list because of the same service instance name.
When the 1 second timer is reached, the old SRV record is removed. The bug was that, the new SRV record also got removed from the list, again due to the same service instance name. Hence ServiceRemoved was triggered.
This issue have been fixed. If any related problems, please feel free to reopen this issue or open a new issue. Thanks!