Access violations with external dlls
Hello,
we have issues when calling functions from some external dlls.
For example: Exception thrown at 0x00007FFDF8CD31BF (mimalloc-override.dll) in xxx.exe: 0xC0000005: Access violation reading location 0x000003AF00000001.
Until now I can only reproduce these exception with release builds of mimalloc-override.dll.
The modules are loaded on demand with LoadLibrary / FreeLibrary.
I am not sure if the dlls are causing heap-corruptions. But as far as I know they work since years without such problems (using normal allocations).
We use mimalloc 2.1.2 (VS 2019; x64).
Any ideas?
Best regards!
Ouch -- that is not good :-(. Can you repro the problem so I can take a look?
In 2.1.2 I made simplifications to the code (and v1.8.2) by removing an intermediate cache and extending arena functionality with better purging. Now, perhaps that introduced a bug? (e.g. purging something that is later still accessed from the Dll -- that might be mimalloc bug (purging too much), or a Dll dangling pointer bug that now exposes itself).
- Can you try to run with
MIMALLOC_PURGE_DECOMMITS=0-- that would MEM_RESET the memory instead of decommit. - Can you try to run with
MIMALLOC_PURGE_DELAY=-1-- that would not purge at all. - Can you try if you see the same when using v2.1.1 (or v1.8.1 or v1.8.2) ? That would narrow the search.
Thank you for your fast reply!
OK, I will try your suggestions.
In an older build of our software I have here, we used 2.0.9. When using the functions that run the dlls, the sw also crashes.
I can try to make a small project to simulate the calls to one of the dlls and give it to you.
OK, I have sent the link to a little project to you. ;)
Using MIMALLOC_PURGE_DECOMMITS=0 or MIMALLOC_PURGE_DELAY=-1 as environment variables does not work.
Today I tried using mimalloc 1.8.1 (and 1.7.2 to give it a try).
They also do not work in release build with my dlls.
The access violation mostly happens in "_mi_page_malloc" (alloc.c).
I think the "page->free" member is corrupted before entering the function. In case the access violation happens, the "page->free.next" member has very suspicious values (when interpreted as a pointer). But since I can use only the release builds, the VS debugger can show sometimes wrong values (optimization).
I tried a lot of things these days to find the reason for the crashes.
With gflags I didn't found anything (exe linked without mimalloc).
I also tried to watch the "free" block list of the main threads default heap "pages_free_direct" array with a separate thread I start on DLL_PROCESS_ATTACH (with enabled SEH-exceptions to be able to catch access violations). Now I know when one of the "mi_block_s::next" members becomes an invalid value. As far as I can see it happens inside the external dll, so it could be a heap corruption. Or the corrupted block in the free list is not really free and still belongs to the allocated memory of the dll?
I go on researching...
Edit: I was able to find the corruption of the "mi_block_s::next" member by setting a data breakpoint. It is triggered from the callstack of the external dll... So the dll overwrites some data in the linked "free" list of the default heap.
Any progress on this? As I understand, the error happens with 2.1.1, 1.8.1 and 1.7.2 ? If I read this, it seems to me that there is a heap block overflow in one of the external dll's where one writes beyond the allocated size.
You should build a debug or secure version of mimalloc -- the secure version can be build in "release" mode if that is required for you. Either one will throw an error if the free list is corrupted. If that happens, we know at least the bug is not in mimalloc itself.
Hi Daan,
yes, I wrote a watcher for the free blocks myself to be able to set a data breakpoint on a mi_block_s.next member to see when a write to it happens. I gave the project to the developers of the dll, and they can repro it. Now they try to fix it. 😊
Best regards, Dennis
Hi Daan,
finally they found the issue.
They had a virtual class which they deleted by failure with the delete for arrays.
I tried it myself, and in that case the offset of the pointer provided to operator delete [] is -8 bytes:
Like this:
struct T_VirtualClass { virtual ~T_VirtualClass() = default; };
int main() { auto&& Ptr{new T_VirtualClass{}}; // address of "Ptr" = 0x0000036b48110018
delete [] Ptr; // Deallocate storage space of array by failure... return 0; }
...
_CRT_SECURITYCRITICAL_ATTRIBUTE void __CRTDECL operator delete[](void* const block, size_t const) noexcept { operator delete; // address of "block": 0x0000036b48110010 }
Thank you for the support!
Best regards, Dennis