snmalloc icon indicating copy to clipboard operation
snmalloc copied to clipboard

Detours implementation of overriding CRT

Open mjp41 opened this issue 6 months ago • 7 comments

This commit adds an override to Windows for replacing the CRT malloc/free etc routines.

~~This is stacked on top of #774.~~

mjp41 avatar Jun 27 '25 15:06 mjp41

@NeilMonday I wonder if you had any thoughts on this PR?

mjp41 avatar Jun 30 '25 13:06 mjp41

I wonder if we should adjust the behavior of malloc to interact with the alloc failure hooks (aka new handlers) on windows. Otherwise, after detour is installed, new handlers may not be invoked as excepted.

The UCRT way of doing this:

// This function implements the logic of malloc().  It is called directly by the
// malloc() function in the Release CRT and is called by the debug heap in the
// Debug CRT.
//
// This function must be marked noinline, otherwise malloc and
// _malloc_base will have identical COMDATs, and the linker will fold
// them when calling one from the CRT. This is necessary because malloc
// needs to support users patching in custom implementations.
extern "C" __declspec(noinline) _CRTRESTRICT void* __cdecl _malloc_base(size_t const size)
{
    // Ensure that the requested size is not too large:
    _VALIDATE_RETURN_NOEXC(_HEAP_MAXREQ >= size, ENOMEM, nullptr);

    // Ensure we request an allocation of at least one byte:
    size_t const actual_size = size == 0 ? 1 : size;

    for (;;)
    {
        void* const block = HeapAlloc(__acrt_heap, 0, actual_size);
        if (block)
            return block;

        // Otherwise, see if we need to call the new handler, and if so call it.
        // If the new handler fails, just return nullptr:
        if (_query_new_mode() == 0 || !_callnewh(actual_size))
        {
            errno = ENOMEM;
            return nullptr;
        }

        // The new handler was successful; try to allocate again...
    }
}

SchrodingerZhu avatar Jul 01 '25 14:07 SchrodingerZhu

That is a great observation. I have a few thoughts:

  1. I am not happy to put it at the top level. That will cause a branch on the fast path, which is really bad. And I think will make the codegen much worse as it will stop it being tail calls everywhere.
  2. We need to determine if we were called by malloc or new as they have different behaviours, this probably means we should template in the failure call, so we can codegen new and malloc differently
  3. There are currently two places where failure turns into a nullptr:
    • https://github.com/microsoft/snmalloc/blob/012138e29f8802157ee430b68a1c64f0ef3fbe0b/src/snmalloc/mem/corealloc.h#L804-L807
    • https://github.com/microsoft/snmalloc/blob/012138e29f8802157ee430b68a1c64f0ef3fbe0b/src/snmalloc/mem/corealloc.h#L668-L697 The second can return nullptr, but it is less obvious. These correspond to failing to allocate a large and small object respectively. I think if we can thread a template to these points that by default sets errno and returns nullptr. And in the case of the Windows Detours version does the check of the set_new_handler stuff.

I think with these changes we will be able to not impact performance, and meet the spec.

Does that make sense to you @schrodingerzhu?

mjp41 avatar Jul 01 '25 19:07 mjp41

The plan sounds solid to me. Just a minor comment:

We need to determine if we were called by malloc or new as they have different behaviours

I think UCRT is calling the new handler regardless of whether the toplevel function is using C-API or C++-API. As demonstrated in the code comment of UCRT _malloc_base function, this function is exactly the implementation of malloc, so malloc also has the behavior.

SchrodingerZhu avatar Jul 02 '25 03:07 SchrodingerZhu

I thought it would only call it in the case of malloc if _query_new_mode() was set? But would always call the handler for new?

We also really need to implement _recalloc, but that requires accurate object sizes:

char* a = (char*)malloc(23);
char* b = _recalloc(a, 32);
assert(b[26] == 0);

This is hard to achieve with a sizeclass allocator as we don't have the accurate size information around.

So I would like to leave _recalloc to a future PR. We could use the custom Meta data feature to implement accurate sizes.

mjp41 avatar Jul 02 '25 05:07 mjp41

I thought it would only call it in the case of malloc if _query_new_mode() was set? But would always call the handler for new?

I see. it makes sense then.

SchrodingerZhu avatar Jul 03 '25 16:07 SchrodingerZhu

#791 means this can now be extended to correctly cover the Windows CRT _set_new_handler behaviour. So moving to draft until that is done.

mjp41 avatar Jul 15 '25 09:07 mjp41