connectedhomeip
connectedhomeip copied to clipboard
Set short TTLs on commissionable DNS-SD advertisements
Problem
Right now we can have commissionable advertisements with quite long TTLs. The minimal_mdns one seems to be 75 minutes if I read the code right. I don't know what the platform impls do (well, I think Darwin uses platform defaults).
Proposed Solution
Set short TTLs for commissionable DNS-SD advertisements, so we don't have stale ones lying around.
@andreilitvin Do we have a way to set the TTL with minmdns?
@kkasperczyk-no do you know what the OpenThread/DnssdImpl.cpp code ends up doing for TTLs?
@bzbarsky-apple OpenThread implementation uses default service lease intervals compatible with the SRP RFC, so 2 hours.
There is an API that allows to set different lease value: https://github.com/project-chip/connectedhomeip/blob/master/src/include/platform/ThreadStackManager.h#L287 (currently it's set to 0, what means use the default). It is worth to mention that in SRP the server (Thread Border Router) is responsible for marking the services, as expired, while we set this lease time for a SRP client. The SRP server may accept lease value requested by the client or not depending on its configuration. By default the range accepted by the OpenThread SRP server is from 30 minutes to 2 hours, but again, it can be changed from defaults by the user.
Btw. I think the only case that we would end with stale data (at least for Thread) is disconnecting the device. In case of re-opening the commissioning window and generating new commissionable service, the old one is removed: https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/Dnssd.cpp#L439 and the same situation applies to the device reboot.
The scenario described in #19484 by @kpark-apple is in fact "user removes the node, power cycles it and attempt to commission the node again"....
It sounds like setting a shorter TTL on our end would not really help that scenario, if the SRP server will not accept values less than 30 minutes anyway.
We more likely want remove advertisements in minmdns.
TTL seems unclear as we generally do not actually know how long our IP will be valid. While in theory it sounds great to say "know how long this address is valid for" the realiity is that devices have no idea and only know "gone now".
@bzbarsky-apple I think the problem described in the issue should appear only in case of factory reset of a device. If removing the node means removing the fabric configuration on a node, the device will still have Thread networking configured, so it will be able to send SRP requests. For Thread, after reboot device refreshes all SRP services, what in particular means removing all of them and adding once again only those that are still up to date. IMO that is exactly what should happen in this case. Unless we are talking about factory reset. In such case there is not much we can do, as device will loose SRP key and in a sense will be completely new device, so it cannot influence on old services anymore.
Sounds like a question for @kpark-apple as to what "removes" means in his setup....
I meant factory reset on the removed node side and removing node info from the peer (controller) side.
I would also say factory reset is likely not a frequent operation is typical use, however it is a very frequent operation in test cases.
My concern is to not optimize for certificate test at the expense of normal operation. If we set very short TTLs, it will be great for frequent 'factory reset' since devices go away fast, but it will increase chatter and discovery on real device installations.
Do we expect that devices will provide a way to enter commissioning mode on network without factory reset when user faces odd issues and wanted to remove and add back devices?
I believe if factory reset is user initiated, code seems to run so I believe device should have some attempt of "unregister from dnssd/srp" during that process.
This does not fix force-flash the device while it is actively running (this is what causes issues for me in my devel setup) however that is not something a regular user does.
created #19692 to see if that improves the story for platforms using minmdns. Other platforms should presumably remove their records on ::Shutdown being called.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
By default the range accepted by the OpenThread SRP server is from 30 minutes to 2 hours, but again, it can be changed from defaults by the user.
The default range for SRP lease time was since changed to [30 seconds, 27 hours]: https://github.com/openthread/openthread/blob/f575021cdb9e1a7c2df058c1fd7103c730ae21f5/src/core/net/srp_server.hpp#L920-L921
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.