rpmalloc
rpmalloc copied to clipboard
Compatibility with Microsoft runtime overloading
Hi, I try to use rpmalloc on a VS2019 C++ Win32 project (x64). I added the rpmalloc.vcxproj to my solution and set ENABLE_OVERRIDE=1 and ENABLE_PRELOAD=1 in both the rpmalloc lib project as well as my exe project. In the exe project I include rpmalloc.h in stdafx.h (no MFC).
First gotcha is that I can't use rpmalloc with the debug build, because VS C++ uses _free_dbg in global delete. I therefore use rpmalloc only in the release build.
After very short time (less than a second) I get a crash in rpmalloc when delete/free is called. With rpmalloc 1.4.0 the span pointer isn't correct which leads to an acces violation. With the current development branch (commit 9adf4e0aed0a60b22c9a4f10d20d674ca55a9f8c) I get a crash because the heap member in the span struct is zero while being dreferenced.
What can I do to find out what's the culprit (I alreaded set ENABLE_ASSERTS=1 which didn't help)?
The application is using boost::asio and a lot threads/executors and is working well with VS29019 onboard heap library. Sometimes within long term tests (24h+) I get delays when the PC wasn't used (in the sense that somebody moved the mouse and clicked something) for hours and then the user i.e. is opening a folder in Windows Explorer. I wanted to try out if these delays (we're talking round about <= 50ms) could be due to a heap lock, because the process is excessively using dynamic heap data (new and delete).
Thank you and best regards.
Most likely a pointer allocated by the VC++ runtime is being passed to rpmalloc delete/free.
The " uses _free_dbg in global delete" makes me think the override isn't correctly linked in, otherwise global delete would have been replaced by the rpmalloc implementation.
Perhaps a DLL you use allocates memory inside the DLL that you later free in your executable? That will not work, as the DLL allocates from the standard runtime, which cannot be passed to rpmalloc.
Hi Mattias,
thank you for your prmpt reply. I just found it today (somehow I didn't notice the email from github).
Regarding the debug build: Deleting a std::string finally calls this function in xmemory:
template <size_t _Align, enable_if_t<(!_HAS_ALIGNED_NEW || _Align <= __STDCPP_DEFAULT_NEW_ALIGNMENT__), int> = 0>
void _Deallocate(void* _Ptr, size_t _Bytes) noexcept {
// deallocate storage allocated by _Allocate when !_HAS_ALIGNED_NEW || _Align <= __STDCPP_DEFAULT_NEW_ALIGNMENT__
#if defined(_M_IX86) || defined(_M_X64)
if (_Bytes >= _Big_allocation_threshold) { // boost the alignment of big allocations to help autovectorization
_Adjust_manually_vector_aligned(_Ptr, _Bytes);
}
#endif // defined(_M_IX86) || defined(_M_X64)
::operator delete(_Ptr, _Bytes);
}
And here delete
in vcruntime_new will be called ending up in static void __cdecl free_dbg_nolock(void* const block, int const block_use) throw()
in debug_heap.cpp.
So you're right that overriding delete didn't work correctly. But I don't understand enough about this overriding works with rpmalloc. Can you shed some light on this?
I guess it doesn't make sense looking at the crashes until overriding new and delete works correctly, right?
Looks like the sized delete operator is not overloaded correctly, looking into this
I added missing C++ operator overrides for sized deletes, please try again and see if it works better now
On the develop branch, that is
I tried again, but AFAICS unfortunately the sized delete operator is still taken from MS delete_scalar_size.cpp resp. delete_scalar.cpp which then finally calls _free_dbg (debug build) in debug_heap.cpp (I don't have access to delete_scalar_size.cpp and delete_scalar.cpp).
I made breakpoints on all the wrapper functions in rpmalloc/malloc.c and can see that operator new in new_scalar.cpp (which I don't have access to) is finally calling malloc in rpmalloc/malloc.c. For delete this still isn't the case.
Please note that all this is taking place when initialising static variables (i.e. boost::exception, etc.).
Alright, I will take a deeper look at this tonight and see if I can reproduce it locally
To me it looks like that neither operator new nor operator delete are used from rpmalloc/malloc.c while malloc and free are, because when debugging int *i = new int;
the only breakpoint hit is the one in malloc in rpmalloc/malloc.c (new is taken from new_scalar.cpp).
Seems I made a mistake with the C++ name mangling on Windows targets. Can you try the latest develop branch and make sure to include the header I added in a single source file in your project? That is, #include <rpnew.h>
in one source file.
Much better, but <xlocale>
and there a struct is defined that overloads new and delete if _DEBUG is defined, where new is using _malloc_dbg and delete is using free, which leads to using MS malloc and rpmalloc free. Not good.
I think it would be good to better understand the MS _dbg mimic (there's also _free_dbg, _calloc_dbg, etc.).
I tried to add these functions to rpmalloc/malloc.c but I got multiply defined symbols (BTW: how do you get around this with malloc, free, etc.?).
So I tried with the release build where _malloc_dbg is malloc (and _free_dbg is free, etc.) and now it's getting interesting if not to say weird.
The basic_streambuf is using <xloctime>
and there the function _Locinfo::_Getdays() which is calling ::_Getdays() which is returning a const char* which afterwards has to be freed. ::_Getdays() is not using malloc, but CRT internal _malloc_base (in malloc_base.cpp) which indeed is MS' malloc implementation which is then using Win32 HeapAlloc.
Now returning from ::_Getdays() the function _Locinfo::_Getdays() is freeing the const char* but not by calling CRT's free implementation _free_base (in free_base.cpp), but normal free. And hence disaster happens, because free will call rpfree.
For the moment I have the impression that replacing the memory library on Windows is like opening a can of worms :-(
From what I saw from the MS CRT code I'm really disappointed. Both the new and delete operators in xlocale (using _malloc_dbg in new and free in delete) and _Locinfo::_Getdays() (using ::_Getdays() which is using _malloc_base and freeing the result with free) are not symmetric. Either one should use _malloc_dbc and then also _free_dbg or malloc and free. The same applies for _malloc_base and _free_base or malloc and free. Very sad.
Could you make a small repro case that triggers this that I can use to debug it?
Will try to do tomorrow.
I have made a simple VS2019 solution consisting of a console application that reproduces the issue. Just add malloc.c, rpmalloc.h, rpmalloc.c and rpnew.h to the same directory. RpMallocTest.zip
Thank you, will have a look
Problem is that the Microsoft runtime is not symmetrical, the code you provided ends up using the Microsoft runtime-specific _malloc_dbg
function in debug builds (_DEBUG is defined), but during cleanup it calls free
, not _free_dbg
to match the debug specific heap allocation done earlier through _malloc_dbg
. Thus it ends up in rpmalloc free
trying to deallocate a memory block which was allocated by the runtime debug heap.
I will have a look at overriding the Microsoft specific debug functions as well.
I tried to do that but always get linker errors due to multiply definded symbols (already defined in libucrtd.lib (debug_heap.obj).
While googling I found this old thread: https://social.msdn.microsoft.com/Forums/vstudio/en-US/d2dd7fcf-c1cf-4b8e-8fc6-cc6101a99c12/how-to-replace-mallocfreenewdelete-with-own-code?forum=vcgeneral In the last post one notes all the functions he thinks that need to be overridden.
There's also a link to VS2017 that seems to still not being fixed, neither in VS2019. I tried the demo project provided.
Seems it's not possible as you say, and using /FORCE:MULTIPLE to use redefined versions of the _dbg functions seems unreliable.
Guess the only way is to do a runtime hotpatching like mimalloc does in its closed source redirect DLL, or like the clang ASAN does in its interception implementation. It's a rather intrusive and arcane way though, not something I'm very keen on adding to rpmalloc repo as it would be more additional code than the entire allocator...
Hm, that's really a pitty, because in the end this means giving up Windows as a platform. Not for dedicated use of rpmalloc, but for replacing the existing memory menagement.
I could do without malloc/free replacement, because I need it for mainly C++, but as we saw there's C++ runtime code that is calling malloc and free directly.
Reading the MS C++/C runtime code I got the impression that (at least once upon a time) the idea was to make it possible to override it (comment in malloc.cpp: "// This function supports patching and therefore must be marked noinline."), but I don't have any personal experience in reporting bugs to Microsofts C/C++ team.
Do you have?
Do you think it's worth trying to make MS change their code to make overriding easier?
Should be possible to use Detours (https://github.com/microsoft/detours) but it's too big of a project for something I personally won't use, so closing this for now.