mdns-sd `ServiceEvent::ServiceRemoved` doesn't work

trafficstars

Hi,

I noticed that ServiceEvent::ServiceRemoved never fires. I tried shutting down the service daemon, I tried stop_browse on the service daemon, but the other computer never sees that my computer stopped listening/browsing/broadcasting.

Both computers are Windows.

Am I understanding correctly this event should fire when a device goes away?

Aug 09 '22 18:08 WilliamVenner

Can you please try to call unregister ? If it didn't work, please let me know.

Aug 10 '22 06:08 keepsimple1

Hi, that does work!

Should the event be firing when a peer exits ungracefully though?

I want to show to the user on the UI that a peer has gone away, even if ungracefully.

Aug 10 '22 08:08 WilliamVenner

While we're here... is there any way to uniquely identify a service/peer? What's stopping me advertising as a fake device or something? Am I misunderstanding the mDNS protocol, or surely is there a way to get the responder's IP address (and not the one they specify in the advertisement, but the actual IP address from the socket)?

Aug 10 '22 09:08 WilliamVenner

I don't think our current implementation detects peer going away ungracefully, even though we use defined TTL values. I can look into this a bit.

I'm not sure about the identity of a service/peer other than it's endpoint (instance name, IP:port), but for mDNS, there is spec for unique resource records identified by its name, rrtype, and rrclass. A resource record can be unique or shared. See this RFC section. Currently we haven't implemented the Conflict Resolution mentioned in the RFC yet.

Regarding the IP address, we distinguish between the responder's IP and the announced service IP. The mDNS protocol specifies the service IP via A (for IPv4) records. Currently we don't log / keep the responder's IP, one reason being that socket2 0.4 branch does not have a safe recv_from API. We could make an effort to get the responder IP, but in the general mDNS sense, it is not useful. At least, I'm not aware of use cases.

Aug 11 '22 04:08 keepsimple1

I don't think our current implementation detects peer going away ungracefully, even though we use defined TTL values. I can look into this a bit.

For my application, I want a list of discovered devices, which removes devices as they go away, just like a Wi-Fi list or something. It would definitely be nice UX. The other issue with not being able to detect when a device goes away is that, if a router gives the IP address to a different device, and then it tries to broadcast itself with mdns-sd, it doesn't seem to... update that state? It won't actually post a ServiceFound event because it thinks the device has already spoken to it, when in fact it's a different device, just with the same IP address. (Honestly that might be a bug in my code, but it would be worth observing)

Regarding the IP address, we distinguish between the responder's IP and the announced service IP.

I think it's important for security to not trust the advertised IP address(es) at all, at least for my application. In a worst case scenario, my application can grant an attacker (virtually) RAT access to the machine if they pull some dirty tricks with the advertised IP address. Maybe I'm being paranoid, but it definitely would be nice to have the actual socket IP address of the advertising service.

Aug 11 '22 11:08 WilliamVenner

For my application, I want a list of discovered devices, which removes devices as they go away, just like a Wi-Fi list or something. It would definitely be nice UX.

How quickly would you like to detect the devices going away in this use case?

Sep 01 '22 17:09 keepsimple1

10 seconds would probably make sense from a UI perspective. It would be nice if that was configurable, of course.

Sep 02 '22 16:09 WilliamVenner

The RFCs do have some specs on TTL values. Let's see. https://datatracker.ietf.org/doc/html/rfc6762#section-10

As a general rule, the recommended TTL value for Multicast DNS resource records with a host name as the resource record's name (e.g., A, AAAA, HINFO) or a host name contained within the resource record's rdata (e.g., SRV, reverse mapping PTR record) SHOULD be 120 seconds.

The recommended TTL value for other Multicast DNS resource records is 75 minutes.

2 minutes sure seems like a long time. (and 75 even more so!), so it's always best to send out those "going away" messages (can't remember what they are called right now).

Apr 18 '23 20:04 dalesmith3

Doh! Sorry about repeated info on the TTL values. Have to read more carefully.

Apr 18 '23 22:04 dalepsmith

No worries. Btw, the same spec also talks about the difficulty / trade-off of detecting stale service / rdata due to the long TTL:

If Multicast DNS resource records follow the recommendation and have a TTL of 75
   minutes, that means that stale data could persist in the system for a
   little over an hour.  Making the default RR TTL significantly lower
   would reduce the lifetime of stale data, but would produce too much
   extra traffic on the network.  Various techniques are available to
   minimize the impact of such stale data, outlined in the five
   subsections below.

In addition to the "goodbye" going away packet, the following method seems interesting to me too:

https://datatracker.ietf.org/doc/html/rfc6762#section-10.4

Sometimes a cache record can be determined to be stale when a client
   attempts to use the rdata it contains, and the client finds that
   rdata to be incorrect.
<snip>
 The software implementing the Multicast DNS resource record cache
   should provide a mechanism so that clients detecting stale rdata can
   inform the cache.

We can probably look into adding support for such technique.

Apr 19 '23 05:04 keepsimple1

https://datatracker.ietf.org/doc/html/rfc6762#section-10.4 Cache Flush on Failure Indication

 The software implementing the Multicast DNS resource record cache
   should provide a mechanism so that clients detecting stale rdata can
   inform the cache.

   When the cache receives this hint that it should reconfirm some
   record, it MUST issue two or more queries for the resource record in
   dispute.  If no response is received within ten seconds, then, even
   though its TTL may indicate that it is not yet due to expire, that
   record SHOULD be promptly flushed from the cache.

Maybe we can support a new method in ServiceDaemon that allows the client ask the daemon to verify / check a given SRV data, say daemon.verify(full_service_instance_name) .

Such an approach lets the user (i.e. the application) to decide the timing of the checks, which avoids a hard-coded / rigid time setting inside this library. And such API does not add any overhead for other applications that don't need to such periodical checks.

May 06 '23 18:05 keepsimple1

mdns-sd mdns-sd copied to clipboard

`ServiceEvent::ServiceRemoved` doesn't work

mdns-sd
mdns-sd copied to clipboard