connectedhomeip
connectedhomeip copied to clipboard
dns-sd server startup is pretty confusing
Problem
Stepping through Server::Init
, the following things happen in order:
- If no fabrics are commissioned yet and
CHIP_DEVICE_CONFIG_ENABLE_PAIRING_AUTOSTART
is defined,mCommissioningWindowManager.OpenBasicCommissioningWindow()
is called, which ends up doingDnssdServer::Instance().StartServer(CommissioningMode::kEnabledBasic)
. -
we call#if CHIP_DEVICE_CONFIG_ENABLE_DNSSD && !CHIP_DEVICE_LAYER_TARGET_ESP32 && !CHIP_DEVICE_LAYER_TARGET_MBED && \ (!CHIP_DEVICE_LAYER_TARGET_AMEBA || !CHIP_DEVICE_CONFIG_ENABLE_CHIPOBLE)
DnssdServer::Instance().StartServer()
with no arg. If no fabrics are commissioned, this ends up being equivalent toStartServer(CommissioningMode::kEnabledBasic)
. - The first call to
StartServer()
that happens, whever it comes from, callsServiceAdvertiser::Instance().Init()
. In the platform-dns-sd case this callsDiscoveryImplPlatform::InitImpl
, which callsChipDnssdInit(HandleDnssdInit, ....)
. WhenHandleDnssdInit
runs (which could be async, in theory) it posts akDnssdPlatformInitialized
event. When that event is processed, byOnPlatformEvent
in Dnssd.cpp, it callsDnssdServer::Instance().StartServer()
with no arg.
Note that if something else happens to call StartServer
with a different arg between step 2 and step 3 (e.g. to turn off commissioning advertisement), step 3 will clobber that setting.
I can't figure out whether step 3 has a minimal-mdns equivalent or not.
The upshot is that it's very hard to reason about what our state is or will be as we start up. Is the DnssdServer responsible for starting up the DnssdPlatform, or vice versa? Right now they "start" each other, with platform dns-sd.
Proposed Solution
Figure out what the actual sequencing here should be and implement it. And ideally not have queued things that will clobber state.
@andy31415 @kkasperczyk-no @Damian-Nordic
And in particular, ideally the only thing that kicks off "advertise me as commissionable" would be the actual code that starts listening for PASE connections, which is very much not where we are right now.
Yeah, i'm also confused about the procedure of mDNS startup, we have not unified them. BTW, i saw some vendors will start the mDNS server after gettting the ip address.
When it comes to #3
, it's necessary because some DNS-SD implementations require async initialization. For example, OpenThread SRP implementation requires that the device first communicates with the SRP server and removes stale service before accepting new ones. I think the platform functions for publishing services might return some CHIP_ERROR_NOT_READY_YET
to make it clear to the DnssdServer
that it should repeat the service advertisement once the platform component is initialized.
It sounds like what we want is separate functions for:
- Ensure the server will start sometime.
- Enqueue some records to advertise once the server is started (which might be now).
The async init will then process the enqueued records. But clearing the set of records when the async init finishes, which is what we have right now, is just wrong....
I agree enqueuing the records would be cleaner, but the current approach that rebuilds all services probably consumes less RAM which is getting quite precious resource and the only DNS-SD state which changes is the fabric table and the commissioning mode (which I believe we should store somewhere so that the async init works properly). Also, note that e.g. OpenThread implementation doesn't remove services on ChipDnssdRemoveServices
- it just marks them for removal upon ChipDnssdFinalizeServiceUpdate
if a given service is not re-added in between of those, so the platform code is not forced to actually remove any services.
Perhaps, there is a better way, but just wanted to notice that the current approach has some pros, too :)
Yes, if we just stored the commissioning mode that would solve the problem....
We get the commissioning mode from the commissioning window manager after #15890, so the main problem here is fixed....
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.