connectedhomeip
connectedhomeip copied to clipboard
DNSServiceResolve leaks on Darwin platform
Problem
There was a report that DNSServiceResolve entities were leaking. It seems that logic to call DNSServiceRefDealloc() is missing when operational node address resolution expires, such as the case with the following error message:
2022-05-10 03:00:23.549-0700 homed [473]: (CHIP) [com.zigbee.chip:all] 0xdf61 🔴 [1652176823501] [473:57185] CHIP: [DIS] Operational discovery failed for 0x0000000000000001: ../../../../CHIPFramework/connectedhomeip/src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:174: CHIP Error 0x00000032: Timeout
Proposed Solution
N/A
So what we have is:
void MdnsContexts::Delete(GenericContext * context)
{
if (context->serviceRef != nullptr)
{
DNSServiceRefDeallocate(context->serviceRef);
}
but in this case it sounds like Resolver::LookupNode kicks off a resolve, then once enough time passes it starts ignoring results from the relevant resolve call, but never stops the actual resolve. And there's nothing in the platform impl that obviously times it out that I can see.
@andy31415 @vivien-apple this seems like another impedance mismatch between minmdns (which does not track separate resolves separately, so does not allocate per-resolve resources) and platform mdns (which can and does, and has to on some platforms).
What we probably need is that when a resolve is dropped by Resolver::HandleAction we need to notify the underlying impl that that resolve is no longer relevant, so it can do the right cleanup.
NodeLookupHandle is supposed to represent an active lookup. While it is active, the lookup is expected to processed and once inactive (removed from internal lists) the dnssd lookups are not needed anymore.
We need some form of notification for done looking up. Currently I see no cancel active resolve in the Resolver interface - I believe the interfaces themselves were not build compatible with both minmdns and platform:
- MinMDNS
lookup/browseis equivalent to "broadcast a query" and uses a global callback for any packet received - PlatformDNS
lookup/browseinitiates a stateful lookup/browse and the callbacks are specific for the one resolution call made.
We are suffering from this distinction. We could add some form of Handle to be returned on lookup and browse requests that can be cancelled after usage is done.
Darwin Review: We believe this affects all platforms, removing Darwin specific. We can come up with a Darwin specific work around, but probably not desired.
Fixed by #24010