connectedhomeip icon indicating copy to clipboard operation
connectedhomeip copied to clipboard

Set short TTLs on commissionable DNS-SD advertisements

Open bzbarsky-apple opened this issue 3 years ago • 13 comments

Problem

Right now we can have commissionable advertisements with quite long TTLs. The minimal_mdns one seems to be 75 minutes if I read the code right. I don't know what the platform impls do (well, I think Darwin uses platform defaults).

Proposed Solution

Set short TTLs for commissionable DNS-SD advertisements, so we don't have stale ones lying around.

bzbarsky-apple avatar Mar 30 '22 22:03 bzbarsky-apple

@andreilitvin Do we have a way to set the TTL with minmdns?

@kkasperczyk-no do you know what the OpenThread/DnssdImpl.cpp code ends up doing for TTLs?

bzbarsky-apple avatar Jun 11 '22 04:06 bzbarsky-apple

@bzbarsky-apple OpenThread implementation uses default service lease intervals compatible with the SRP RFC, so 2 hours.

There is an API that allows to set different lease value: https://github.com/project-chip/connectedhomeip/blob/master/src/include/platform/ThreadStackManager.h#L287 (currently it's set to 0, what means use the default). It is worth to mention that in SRP the server (Thread Border Router) is responsible for marking the services, as expired, while we set this lease time for a SRP client. The SRP server may accept lease value requested by the client or not depending on its configuration. By default the range accepted by the OpenThread SRP server is from 30 minutes to 2 hours, but again, it can be changed from defaults by the user.

kkasperczyk-no avatar Jun 13 '22 06:06 kkasperczyk-no

Btw. I think the only case that we would end with stale data (at least for Thread) is disconnecting the device. In case of re-opening the commissioning window and generating new commissionable service, the old one is removed: https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/Dnssd.cpp#L439 and the same situation applies to the device reboot.

kkasperczyk-no avatar Jun 13 '22 06:06 kkasperczyk-no

The scenario described in #19484 by @kpark-apple is in fact "user removes the node, power cycles it and attempt to commission the node again"....

It sounds like setting a shorter TTL on our end would not really help that scenario, if the SRP server will not accept values less than 30 minutes anyway.

bzbarsky-apple avatar Jun 14 '22 06:06 bzbarsky-apple

We more likely want remove advertisements in minmdns.

TTL seems unclear as we generally do not actually know how long our IP will be valid. While in theory it sounds great to say "know how long this address is valid for" the realiity is that devices have no idea and only know "gone now".

andy31415 avatar Jun 14 '22 13:06 andy31415

@bzbarsky-apple I think the problem described in the issue should appear only in case of factory reset of a device. If removing the node means removing the fabric configuration on a node, the device will still have Thread networking configured, so it will be able to send SRP requests. For Thread, after reboot device refreshes all SRP services, what in particular means removing all of them and adding once again only those that are still up to date. IMO that is exactly what should happen in this case. Unless we are talking about factory reset. In such case there is not much we can do, as device will loose SRP key and in a sense will be completely new device, so it cannot influence on old services anymore.

kkasperczyk-no avatar Jun 15 '22 05:06 kkasperczyk-no

Sounds like a question for @kpark-apple as to what "removes" means in his setup....

bzbarsky-apple avatar Jun 16 '22 06:06 bzbarsky-apple

I meant factory reset on the removed node side and removing node info from the peer (controller) side.

kpark-apple avatar Jun 16 '22 12:06 kpark-apple

I would also say factory reset is likely not a frequent operation is typical use, however it is a very frequent operation in test cases.

My concern is to not optimize for certificate test at the expense of normal operation. If we set very short TTLs, it will be great for frequent 'factory reset' since devices go away fast, but it will increase chatter and discovery on real device installations.

andy31415 avatar Jun 16 '22 12:06 andy31415

Do we expect that devices will provide a way to enter commissioning mode on network without factory reset when user faces odd issues and wanted to remove and add back devices?

kpark-apple avatar Jun 16 '22 12:06 kpark-apple

I believe if factory reset is user initiated, code seems to run so I believe device should have some attempt of "unregister from dnssd/srp" during that process.

This does not fix force-flash the device while it is actively running (this is what causes issues for me in my devel setup) however that is not something a regular user does.

andy31415 avatar Jun 16 '22 13:06 andy31415

created #19692 to see if that improves the story for platforms using minmdns. Other platforms should presumably remove their records on ::Shutdown being called.

andy31415 avatar Jun 16 '22 19:06 andy31415

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Dec 14 '22 06:12 stale[bot]

By default the range accepted by the OpenThread SRP server is from 30 minutes to 2 hours, but again, it can be changed from defaults by the user.

The default range for SRP lease time was since changed to [30 seconds, 27 hours]: https://github.com/openthread/openthread/blob/f575021cdb9e1a7c2df058c1fd7103c730ae21f5/src/core/net/srp_server.hpp#L920-L921

jwhui avatar Feb 08 '23 07:02 jwhui

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Aug 11 '23 21:08 stale[bot]