connectedhomeip icon indicating copy to clipboard operation
connectedhomeip copied to clipboard

dns-sd server startup is pretty confusing

Open bzbarsky-apple opened this issue 3 years ago • 8 comments

Problem

Stepping through Server::Init, the following things happen in order:

  1. If no fabrics are commissioned yet and CHIP_DEVICE_CONFIG_ENABLE_PAIRING_AUTOSTART is defined, mCommissioningWindowManager.OpenBasicCommissioningWindow() is called, which ends up doing DnssdServer::Instance().StartServer(CommissioningMode::kEnabledBasic).
  2. #if CHIP_DEVICE_CONFIG_ENABLE_DNSSD && !CHIP_DEVICE_LAYER_TARGET_ESP32 && !CHIP_DEVICE_LAYER_TARGET_MBED &&                        \
    (!CHIP_DEVICE_LAYER_TARGET_AMEBA || !CHIP_DEVICE_CONFIG_ENABLE_CHIPOBLE)
    
    we call DnssdServer::Instance().StartServer() with no arg. If no fabrics are commissioned, this ends up being equivalent to StartServer(CommissioningMode::kEnabledBasic).
  3. The first call to StartServer() that happens, whever it comes from, calls ServiceAdvertiser::Instance().Init(). In the platform-dns-sd case this calls DiscoveryImplPlatform::InitImpl, which calls ChipDnssdInit(HandleDnssdInit, ....). When HandleDnssdInit runs (which could be async, in theory) it posts a kDnssdPlatformInitialized event. When that event is processed, by OnPlatformEvent in Dnssd.cpp, it calls DnssdServer::Instance().StartServer() with no arg.

Note that if something else happens to call StartServer with a different arg between step 2 and step 3 (e.g. to turn off commissioning advertisement), step 3 will clobber that setting.

I can't figure out whether step 3 has a minimal-mdns equivalent or not.

The upshot is that it's very hard to reason about what our state is or will be as we start up. Is the DnssdServer responsible for starting up the DnssdPlatform, or vice versa? Right now they "start" each other, with platform dns-sd.

Proposed Solution

Figure out what the actual sequencing here should be and implement it. And ideally not have queued things that will clobber state.

@andy31415 @kkasperczyk-no @Damian-Nordic

bzbarsky-apple avatar Jan 22 '22 01:01 bzbarsky-apple

And in particular, ideally the only thing that kicks off "advertise me as commissionable" would be the actual code that starts listening for PASE connections, which is very much not where we are right now.

bzbarsky-apple avatar Jan 22 '22 01:01 bzbarsky-apple

Yeah, i'm also confused about the procedure of mDNS startup, we have not unified them. BTW, i saw some vendors will start the mDNS server after gettting the ip address.

jamesluo11 avatar Jan 22 '22 07:01 jamesluo11

When it comes to #3, it's necessary because some DNS-SD implementations require async initialization. For example, OpenThread SRP implementation requires that the device first communicates with the SRP server and removes stale service before accepting new ones. I think the platform functions for publishing services might return some CHIP_ERROR_NOT_READY_YET to make it clear to the DnssdServer that it should repeat the service advertisement once the platform component is initialized.

Damian-Nordic avatar Jan 24 '22 08:01 Damian-Nordic

It sounds like what we want is separate functions for:

  1. Ensure the server will start sometime.
  2. Enqueue some records to advertise once the server is started (which might be now).

The async init will then process the enqueued records. But clearing the set of records when the async init finishes, which is what we have right now, is just wrong....

bzbarsky-apple avatar Jan 25 '22 07:01 bzbarsky-apple

I agree enqueuing the records would be cleaner, but the current approach that rebuilds all services probably consumes less RAM which is getting quite precious resource and the only DNS-SD state which changes is the fabric table and the commissioning mode (which I believe we should store somewhere so that the async init works properly). Also, note that e.g. OpenThread implementation doesn't remove services on ChipDnssdRemoveServices - it just marks them for removal upon ChipDnssdFinalizeServiceUpdate if a given service is not re-added in between of those, so the platform code is not forced to actually remove any services.

Perhaps, there is a better way, but just wanted to notice that the current approach has some pros, too :)

Damian-Nordic avatar Jan 25 '22 08:01 Damian-Nordic

Yes, if we just stored the commissioning mode that would solve the problem....

bzbarsky-apple avatar Jan 27 '22 08:01 bzbarsky-apple

We get the commissioning mode from the commissioning window manager after #15890, so the main problem here is fixed....

bzbarsky-apple avatar Jun 13 '22 19:06 bzbarsky-apple

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Dec 11 '22 18:12 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jun 24 '23 09:06 stale[bot]