connectedhomeip icon indicating copy to clipboard operation
connectedhomeip copied to clipboard

DNSServiceResolve leaks on Darwin platform

Open kpark-apple opened this issue 3 years ago • 3 comments

Problem

There was a report that DNSServiceResolve entities were leaking. It seems that logic to call DNSServiceRefDealloc() is missing when operational node address resolution expires, such as the case with the following error message:

2022-05-10 03:00:23.549-0700 homed [473]: (CHIP) [com.zigbee.chip:all] 0xdf61    🔴 [1652176823501] [473:57185] CHIP: [DIS] Operational discovery failed for 0x0000000000000001: ../../../../CHIPFramework/connectedhomeip/src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:174: CHIP Error 0x00000032: Timeout

Proposed Solution

N/A

kpark-apple avatar May 31 '22 19:05 kpark-apple

So what we have is:

void MdnsContexts::Delete(GenericContext * context)
{
    if (context->serviceRef != nullptr)
    {
        DNSServiceRefDeallocate(context->serviceRef);
    }

but in this case it sounds like Resolver::LookupNode kicks off a resolve, then once enough time passes it starts ignoring results from the relevant resolve call, but never stops the actual resolve. And there's nothing in the platform impl that obviously times it out that I can see.

@andy31415 @vivien-apple this seems like another impedance mismatch between minmdns (which does not track separate resolves separately, so does not allocate per-resolve resources) and platform mdns (which can and does, and has to on some platforms).

What we probably need is that when a resolve is dropped by Resolver::HandleAction we need to notify the underlying impl that that resolve is no longer relevant, so it can do the right cleanup.

bzbarsky-apple avatar Jun 03 '22 07:06 bzbarsky-apple

NodeLookupHandle is supposed to represent an active lookup. While it is active, the lookup is expected to processed and once inactive (removed from internal lists) the dnssd lookups are not needed anymore.

We need some form of notification for done looking up. Currently I see no cancel active resolve in the Resolver interface - I believe the interfaces themselves were not build compatible with both minmdns and platform:

  • MinMDNS lookup/browse is equivalent to "broadcast a query" and uses a global callback for any packet received
  • PlatformDNS lookup/browse initiates a stateful lookup/browse and the callbacks are specific for the one resolution call made.

We are suffering from this distinction. We could add some form of Handle to be returned on lookup and browse requests that can be cancelled after usage is done.

andy31415 avatar Jun 03 '22 13:06 andy31415

Darwin Review: We believe this affects all platforms, removing Darwin specific. We can come up with a Darwin specific work around, but probably not desired.

woody-apple avatar Jun 14 '22 18:06 woody-apple

Fixed by #24010

bzbarsky-apple avatar Dec 16 '22 21:12 bzbarsky-apple