c-ares icon indicating copy to clipboard operation
c-ares copied to clipboard

c-ares triggers Windows Server kernel object leak

Open pgroke-dt opened this issue 4 years ago • 2 comments

There seems to be a bug in current Windows Server versions which leads to leaking the kernel's "process object", and it's triggered by how c-ares uses UDP sockets, namely using connect + send on a UDP socket.

To reproduce the bug with c-ares it is sufficient to make two calls to ares_gethostbyname without any "process" call in between.

I've also written a small reproducer with plain WinSock functions:

#define _WIN32_WINNT  0x0501
#define WIN32_LEAN_AND_MEAN

#include <Windows.h>
#include <WinSock2.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#pragma comment(lib, "Ws2_32.lib")

int main() {
    WSADATA wsaData;
    (void)WSAStartup(MAKEWORD(2, 2), &wsaData);

    struct sockaddr_in sa ;
    memset(&sa, 0, sizeof(sa));
    sa.sin_family = AF_INET;
    sa.sin_addr.S_un.S_un_b.s_b1 = 127; // IP doesn't seem to matter as long as it's valid
    sa.sin_addr.S_un.S_un_b.s_b2 = 0;
    sa.sin_addr.S_un.S_un_b.s_b3 = 0;
    sa.sin_addr.S_un.S_un_b.s_b4 = 1;
    sa.sin_port = htons(8080); // port doesn't seem to matter as long as it's valid

    int const s = socket(AF_INET, SOCK_DGRAM, 0); // create AF_INET/SOCK_DGRAM socket = UDP
        // "connect" this UDP socket. of course this doesn't really establish a connection,
        // but it allows the application to use `send` i.e. send without specifying the destination address + port.
        // necessary for the leak because the leak only occurs when we call `send` at least once.
    int c = connect(s, (struct sockaddr const*)(&sa), sizeof(sa));

    char const data[] = { 0, 0 }; // size & contents doesn't seem to matter
        // the leak does NOT happen with just one `send` or `sendto` call
        // the leak does NOT happen if we replace at least the 2nd `send` call with `sendto` - even if we still `connect`
        // the leak DOES still happen if we only replace the 1st `send` call with `sendto`
        // and of course the leak also happens with the 2x `send` that we use here
    int p;
    p = send(s, data, sizeof(data), 0);
    printf("send: %d\n", p);
    p = send(s, data, sizeof(data), 0);
    printf("send: %d\n", p);

    closesocket(s); // not necessary for the leak but doesn't help either
    WSACleanup();   // not necessary for the leak but doesn't help either
    return 0;
}

If an application that does this is executed repeatedly, the leaked process objects will accumulate and consume a lot of memory, until the machine eventually locks up. You can watch it by monitoring the performance counter "\Objects\Processes", either by using the PowerShell command

Get-Counter -Counter "\Objects\Processes"

or with the "Performance Monitor" GUI application.

We've been able to reproduce this on Windows Server versions 1809 (unpatched) through 1909 (fully patched). We were not able to reproduce it with Windows 10.


We also found out that it's possible to work around this Windows bug by calling sendto instead of send. It's important to specify the target address when calling sendto though -- calling it without a target address behaves just like send and also triggers the leak.


We're currently in the process of testing a local c-ares patch that uses the following function instead of send to work around the bug:

int workaround_send(SOCKET s, const char* buf, int len, int flags)
{
  /* Try to get the socket's type and remote address/port via getsockopt(SO_BSP_STATE). */
  struct addr_buffer
  {
    CSADDR_INFO addr_info;
    /* Extra space for the actual address data. CSADDR_INFO only contains pointer/length pairs for the addresses,
     * the actual address data in the provided buffer after the CSADDR_INFO struct.
     * In theory 2x sizeof(SOCKADDR_STORAGE) should be enough, but providing some extra space can't hurt -> use 4x. */
    char extra[4 * sizeof(SOCKADDR_STORAGE)];
  } addr_buf;
  int opt_len = sizeof(addr_buf);
  const int gso = getsockopt(s, SOL_SOCKET, SO_BSP_STATE, (char*)(&addr_buf), &opt_len);

  /* Check if everything worked, the socket is type SOCK_DGRAM and connected... */
  if (gso == 0
      && opt_len >= sizeof(CSADDR_INFO)
      && addr_buf.addr_info.iSocketType == SOCK_DGRAM
      && addr_buf.addr_info.RemoteAddr.lpSockaddr
      && addr_buf.addr_info.RemoteAddr.iSockaddrLength)
  {
    /* Call sendto() with the "connected" remote address instead of send() to avoid the kernel object leak.
     * (Note that calling sendto() with a NULL remote address also causes a leak) */
    return sendto(s, buf, len, flags, addr_buf.addr_info.RemoteAddr.lpSockaddr, addr_buf.addr_info.RemoteAddr.iSockaddrLength);
  }

  /* Something didn't work as expected or the socket isn't connected or not type SOCK_DGRAM.
   * In any case, the safest thing is to just fall back to calling send(). */
  return send(s, buf, len, flags);
}

Which seems to work fine so far.


So I know this is not a bug in c-ares and that our patch is far from pretty. But I still thought you should know, which is why I created this ticket.

ps: If you decide to try to reproduce the bug please let me know. I'd be interested to hear whether it reproduced on your system or not.

pgroke-dt avatar Feb 28 '20 13:02 pgroke-dt

Does using WSASend() instead of send() but still passing a non-blocking, but also non-overlapped buffer trigger the same leak? That might be a quicker, less-invasive change. Just looking at the code, trying to properly tool in sendto() would be pretty difficult.

bradh352 avatar Feb 29 '20 12:02 bradh352

Ah, good, idea, I didn't think of that. But unfortunately WSASend seems to behave identically.

I've tried my above example with the following modifications...

    int const s = socket(AF_INET, SOCK_DGRAM, 0); // create AF_INET/SOCK_DGRAM socket = UDP
    printf("socket: %d\n", s);
    u_long nbio = 1;
    int const ioctl = ioctlsocket(s, FIONBIO, &nbio);
    printf("ioctlsocket: %d\n", ioctl);
        // "connect" this UDP socket. of course this doesn't really establish a connection,
        // but it allows the application to use `send` i.e. send without specifying the destination address + port.
        // necessary for the leak because the leak only occurs when we call `send` at least once.
    int const c = connect(s, (struct sockaddr const*)(&sa), sizeof(sa));
    printf("connect: %d\n", c);

    char data[] = { 0, 0 }; // size & contents doesn't seem to matter
    int p;
    WSABUF wsabuf1;
    DWORD bytesSent = 0;
    wsabuf1.buf = data;
    wsabuf1.len = sizeof(data);
    p = WSASend(s, &wsabuf1, 1, &bytesSent, 0, NULL, NULL);
    printf("send: %d, %u\n", p, bytesSent);

    WSABUF wsabuf2;
    wsabuf2.buf = data;
    wsabuf2.len = sizeof(data);
    p = WSASend(s, &wsabuf2, 1, &bytesSent, 0, NULL, NULL);
    printf("send: %d, %u\n", p, bytesSent);

...and it also triggers the leak.

Just looking at the code, trying to properly tool in sendto() would be pretty difficult.

Yes, I know. I had a look at the code myself and I guess it would be quite some task. Fortunately our application doesn't do a ton of DNS requests so the overhead of calling getsockopt is acceptable for us. But if I was maintaining c-ares, I would also think twice about adding such a patch to the library.

pgroke-dt avatar Mar 02 '20 09:03 pgroke-dt

Do we know if this still happens on the latest Windows server and desktop variants ?

bradh352 avatar Oct 01 '23 13:10 bradh352

Unfortunately I'm no longer able to reproduce this - even with the same Windows Server versions that I have used previously, e.g. 1809 (unpatched). (I don't have the VMs anymore that I had used for testing originally - I've re-created all my testing VMs since then.)

My guess is that there's some additional required factor that I'm not aware of. Maybe some system setting or a particular driver is needed, IDK.

Nevertheless, I did also re-test it with a WS 2022 VM (last updated 2022-09-28 because that's the state of the snapshot that I have) - and I could not reproduce it. But since I also can't reproduce it with an unpatched Windows Server 1809 anymore, and I know that it used to happen with that version, I don't think we can take that as confirmation that it has been fixed.

pgroke-dt avatar Oct 02 '23 08:10 pgroke-dt

thanks, going to close this then since it can no longer be reproduced.

bradh352 avatar Oct 30 '23 15:10 bradh352