mdns-sd
mdns-sd copied to clipboard
`ServiceEvent::ServiceRemoved` doesn't work
Hi,
I noticed that ServiceEvent::ServiceRemoved never fires. I tried shutting down the service daemon, I tried stop_browse on the service daemon, but the other computer never sees that my computer stopped listening/browsing/broadcasting.
Both computers are Windows.
Am I understanding correctly this event should fire when a device goes away?
Can you please try to call unregister ? If it didn't work, please let me know.
Hi, that does work!
Should the event be firing when a peer exits ungracefully though?
I want to show to the user on the UI that a peer has gone away, even if ungracefully.
While we're here... is there any way to uniquely identify a service/peer? What's stopping me advertising as a fake device or something? Am I misunderstanding the mDNS protocol, or surely is there a way to get the responder's IP address (and not the one they specify in the advertisement, but the actual IP address from the socket)?
I don't think our current implementation detects peer going away ungracefully, even though we use defined TTL values. I can look into this a bit.
I'm not sure about the identity of a service/peer other than it's endpoint (instance name, IP:port), but for mDNS, there is spec for unique resource records identified by its name, rrtype, and rrclass. A resource record can be unique or shared. See this RFC section. Currently we haven't implemented the Conflict Resolution mentioned in the RFC yet.
Regarding the IP address, we distinguish between the responder's IP and the announced service IP. The mDNS protocol specifies the service IP via A (for IPv4) records. Currently we don't log / keep the responder's IP, one reason being that socket2 0.4 branch does not have a safe recv_from API. We could make an effort to get the responder IP, but in the general mDNS sense, it is not useful. At least, I'm not aware of use cases.
I don't think our current implementation detects peer going away ungracefully, even though we use defined TTL values. I can look into this a bit.
For my application, I want a list of discovered devices, which removes devices as they go away, just like a Wi-Fi list or something. It would definitely be nice UX. The other issue with not being able to detect when a device goes away is that, if a router gives the IP address to a different device, and then it tries to broadcast itself with mdns-sd, it doesn't seem to... update that state? It won't actually post a ServiceFound event because it thinks the device has already spoken to it, when in fact it's a different device, just with the same IP address. (Honestly that might be a bug in my code, but it would be worth observing)
Regarding the IP address, we distinguish between the responder's IP and the announced service IP.
I think it's important for security to not trust the advertised IP address(es) at all, at least for my application. In a worst case scenario, my application can grant an attacker (virtually) RAT access to the machine if they pull some dirty tricks with the advertised IP address. Maybe I'm being paranoid, but it definitely would be nice to have the actual socket IP address of the advertising service.
For my application, I want a list of discovered devices, which removes devices as they go away, just like a Wi-Fi list or something. It would definitely be nice UX.
How quickly would you like to detect the devices going away in this use case?
10 seconds would probably make sense from a UI perspective. It would be nice if that was configurable, of course.
The RFCs do have some specs on TTL values. Let's see. https://datatracker.ietf.org/doc/html/rfc6762#section-10
As a general rule, the recommended TTL value for Multicast DNS resource records with a host name as the resource record's name (e.g., A, AAAA, HINFO) or a host name contained within the resource record's rdata (e.g., SRV, reverse mapping PTR record) SHOULD be 120 seconds.
The recommended TTL value for other Multicast DNS resource records is 75 minutes.
2 minutes sure seems like a long time. (and 75 even more so!), so it's always best to send out those "going away" messages (can't remember what they are called right now).
Doh! Sorry about repeated info on the TTL values. Have to read more carefully.
No worries. Btw, the same spec also talks about the difficulty / trade-off of detecting stale service / rdata due to the long TTL:
If Multicast DNS resource records follow the recommendation and have a TTL of 75
minutes, that means that stale data could persist in the system for a
little over an hour. Making the default RR TTL significantly lower
would reduce the lifetime of stale data, but would produce too much
extra traffic on the network. Various techniques are available to
minimize the impact of such stale data, outlined in the five
subsections below.
In addition to the "goodbye" going away packet, the following method seems interesting to me too:
https://datatracker.ietf.org/doc/html/rfc6762#section-10.4
Sometimes a cache record can be determined to be stale when a client
attempts to use the rdata it contains, and the client finds that
rdata to be incorrect.
<snip>
The software implementing the Multicast DNS resource record cache
should provide a mechanism so that clients detecting stale rdata can
inform the cache.
We can probably look into adding support for such technique.
https://datatracker.ietf.org/doc/html/rfc6762#section-10.4 Cache Flush on Failure Indication
The software implementing the Multicast DNS resource record cache
should provide a mechanism so that clients detecting stale rdata can
inform the cache.
When the cache receives this hint that it should reconfirm some
record, it MUST issue two or more queries for the resource record in
dispute. If no response is received within ten seconds, then, even
though its TTL may indicate that it is not yet due to expire, that
record SHOULD be promptly flushed from the cache.
Maybe we can support a new method in ServiceDaemon that allows the client ask the daemon to verify / check a given SRV data, say daemon.verify(full_service_instance_name) .
Such an approach lets the user (i.e. the application) to decide the timing of the checks, which avoids a hard-coded / rigid time setting inside this library. And such API does not add any overhead for other applications that don't need to such periodical checks.