Cannot get app verifier VirtualAlloc clean despite no outstanding allocations
We are attempting to swap in Mimalloc for the CRT heap allocator for our Windows driver, unfortunately I cannot get the code app verifier clean despite my experiments with calling mi_collect(true), mi_option_destroy_on_exit, making direct calls to mi_win_main() from our DllMain, or adding a _mi_heap_collect_abandon(heap) in _mi_process_done(). I have printed the mi_stats() at the end of our DllMain() DLL_PROCESS_DETACH, which is listed below, if I am understanding this correctly there are no outstanding allocations and we have at this point done our CRT_INIT() (de-init) call so this is truly the end but there are still running threads but those were not created from our module. I think the concerning value in the below stats is the 1gb of reserved memory.
The usage scenario here is complicated, a third party application is loading a Microsoft DLL. which is then loading us. All three code owners here are creating threads though not all of them would enter our driver's code so I am unsure if Mimalloc's process and thread hooking is having issues here. Additionally our driver is loaded and unloaded multiple times in the process, although we fail at every unload including the first but I wanted to mention it because of issue #288. I have built Mimalloc both as a static lib linking it into the driver as well as built source in directly into our DLL.
Does anyone have advice on how I can get Mimalloc to unreserve all of its pages?
heap stats: peak total freed current unit count
normal 6: 14.3 KiB 587.9 KiB 587.9 KiB 0 48 B 12.4 K ok
normal 8: 695.1 KiB 52.3 MiB 52.3 MiB 0 64 B 858.4 K ok
normal 9: 511.6 KiB 13.1 MiB 13.1 MiB 0 80 B 172.7 K ok
normal 10: 395.2 KiB 4.1 MiB 4.1 MiB 0 96 B 45.2 K ok
normal 11: 371.7 KiB 7.7 MiB 7.7 MiB 0 112 B 72.9 K ok
normal 12: 877.9 KiB 6.5 MiB 6.5 MiB 0 128 B 53.7 K ok
normal 13: 2.6 MiB 130.4 MiB 130.4 MiB 0 160 B 855.1 K ok
normal 14: 1.3 MiB 13.0 MiB 13.0 MiB 0 192 B 71.1 K ok
normal 15: 937.5 KiB 640.8 MiB 640.8 MiB 0 224 B 2.9 M ok
normal 16: 706.0 KiB 20.9 MiB 20.9 MiB 0 256 B 85.6 K ok
normal 17: 975.6 KiB 8.1 MiB 8.1 MiB 0 320 B 26.8 K ok
normal 18: 101.6 KiB 26.4 MiB 26.4 MiB 0 384 B 72.2 K ok
normal 19: 407.5 KiB 33.5 MiB 33.5 MiB 0 448 B 78.5 K ok
normal 20: 53.2 KiB 1.1 MiB 1.1 MiB 0 512 B 2.3 K ok
normal 21: 1.1 MiB 71.4 MiB 71.4 MiB 0 640 B 116.9 K ok
normal 22: 852.3 KiB 14.2 MiB 14.2 MiB 0 768 B 19.4 K ok
normal 23: 399.6 KiB 1.4 MiB 1.4 MiB 0 896 B 1.6 K ok
normal 24: 87.3 KiB 27.7 MiB 27.7 MiB 0 1.0 KiB 28.3 K ok
normal 25: 887.2 KiB 80.2 MiB 80.2 MiB 0 1.2 KiB 65.7 K ok
normal 26: 1.0 MiB 36.3 MiB 36.3 MiB 0 1.5 KiB 24.7 K ok
normal 27: 2.3 MiB 15.8 MiB 15.8 MiB 0 1.7 KiB 9.2 K ok
normal 28: 1.0 MiB 17.1 MiB 17.1 MiB 0 2.0 KiB 8.7 K ok
normal 29: 5.0 MiB 68.8 MiB 68.8 MiB 0 2.5 KiB 28.1 K ok
normal 30: 328.2 KiB 3.9 MiB 3.9 MiB 0 3.0 KiB 1.3 K ok
normal 31: 713.2 KiB 7.8 MiB 7.8 MiB 0 3.5 KiB 2.3 K ok
normal 32: 437.7 KiB 1.7 MiB 1.7 MiB 0 4.0 KiB 457 ok
normal 33: 707.7 KiB 21.1 MiB 21.1 MiB 0 5.0 KiB 4.3 K ok
normal 34: 1.1 MiB 17.0 MiB 17.0 MiB 0 6.0 KiB 2.9 K ok
normal 35: 1.9 MiB 4.9 MiB 4.9 MiB 0 7.0 KiB 721 ok
normal 36: 642.5 KiB 20.4 MiB 20.4 MiB 0 8.0 KiB 2.6 K ok
normal 37: 883.4 KiB 25.5 MiB 25.5 MiB 0 10.0 KiB 2.6 K ok
normal 38: 698.7 KiB 11.2 MiB 11.2 MiB 0 12.0 KiB 958 ok
normal 39: 4.1 MiB 17.4 MiB 17.4 MiB 0 14.0 KiB 1.2 K ok
normal 40: 835.2 KiB 2.0 MiB 2.0 MiB 0 16.0 KiB 134 ok
normal 41: 1.8 MiB 51.3 MiB 51.3 MiB 0 20.0 KiB 2.6 K ok
normal 42: 867.3 KiB 4.9 MiB 4.9 MiB 0 24.0 KiB 212 ok
normal 43: 1.4 MiB 31.6 MiB 31.6 MiB 0 28.1 KiB 1.1 K ok
normal 44: 514.0 KiB 1.4 MiB 1.4 MiB 0 32.1 KiB 45 ok
normal 45: 2.5 MiB 86.7 MiB 86.7 MiB 0 40.1 KiB 2.2 K ok
normal 46: 1.0 MiB 4.2 MiB 4.2 MiB 0 48.1 KiB 90 ok
normal 47: 1.3 MiB 46.0 MiB 46.0 MiB 0 56.2 KiB 842 ok
normal 48: 1.3 MiB 6.4 MiB 6.4 MiB 0 64.2 KiB 103 ok
normal 49: 3.1 MiB 119.1 MiB 119.1 MiB 0 80.3 KiB 1.5 K ok
normal 50: 1.6 MiB 5.1 MiB 5.1 MiB 0 96.3 KiB 55 ok
normal 51: 2.9 MiB 98.5 MiB 98.5 MiB 0 112.4 KiB 901 ok
normal 52: 514.0 KiB 642.5 KiB 642.5 KiB 0 128.5 KiB 5 ok
heap stats: peak total freed current unit count
normal: 38.7 Mi 1.7 Gi 1.7 Gi 0 335 B 5.7 M ok
large: 38.4 Mi 137.2 Mi 137.2 Mi 0 232.8 KiB 606 ok
huge: 0 0 0 0 ok
total: 77.2 MiB 1.9 GiB 1.9 GiB 0 ok
malloc req: 69.4 MiB 1.7 GiB 1.7 GiB 0 ok
reserved: 1.0 GiB 1.0 GiB 0 1.0 GiB
committed: 138.9 MiB 358.8 MiB 358.7 MiB 112.4 KiB
reset: 0
purged: 783.3 MiB
touched: 99.0 MiB 293.9 MiB 293.9 MiB 0 ok
segments: 25 37 37 0 ok
-abandoned: 13 21 21 0 ok
-cached: 0 0 0 0 ok
pages: 1000 2.1 Ki 2.1 Ki 0 ok
-abandoned: 451 857 857 0 ok
-extended: 7.7 Ki
-noretire: 216.9 Ki
arenas: 1
-crossover: 0
-rollback: 0
mmaps: 15
commits: 1.4 Ki
resets: 0
purges: 658
guarded: 0
threads: 14 22 22 0 ok
searches: 1.0 avg
numa nodes: 1
elapsed: 3.921 s
process: user: 19.015 s, system: 8.156 s, faults: 1102785, rss: 500.4 MiB, commit: 1.3 GiB
The App Verifier message, one of the leaked pages is the first VirtualAlloc() at 0x20000000000 which I believe is the default heap.
VERIFIER STOP 0000000000000903: pid 0x20A8: A virtual reservation was leaked.
0000020000000000 : Leaked reservation address.
00000162C2562450 : Address to the allocation stack trace. Run dps <address> to view the allocation stack.
00000162D0426FE4 : Address of the owner dll name. Run du <address> to read the dll name.
00007FF875210000 : Base of the owner dll. Run .reload <dll_name> = <address> to reload the owner dll. Use 'lm' to get more information about the loaded and unloaded modules.
Interesting -- I need to ponder it a bit more; but have you tried MIMALLOC_DESTROY_ON_EXIT=1 option (you can set it in the program as well). This will (I think) destroy all arenas as well which is the final allocation you see. Of course, you better be sure no one is holding on to any memory in there !
I have previously tried running mi_option_set(mi_option_destroy_on_exit, true) as the first operation of our DllMain, before CRT_INIT and I tested just now doing the same via setting the default value in options.c but neither fixed the issue. I also have walked through the code and saw that _mi_process_done() is executing the if (mi_option_is_enabled(mi_option_destroy_on_exit)) condition. I also tried a suggestion from issue #288 and added _mi_heap_collect_abandon(heap) into _mi_process_done(). Nothing has worked for so far.
But there is a major caveat to the above, we get nullptr dereferences in Mimalloc code if I allow it to do its own hooking into Dll loading and unloading. The mi_tls_callback_pre code, which is wildly interesting, causes Mimalloc to always go before our own DllMain which is no good because Mimalloc inits before our CRT_INIT but then Mimalloc de-inits before our CRT(DE)INIT. To fix this I removed all of the code related to TLS callbacks and call mi_win_main() from our own DllMain() function. So for DLL_PROCESS_ATTACH and DLL_THREAD_ATTACH we call mi_win_main() before all of our code, then for the DETACH reasons mi_win_main() is called after all of our code. The intent was to fully wrap around our own code and this does fix the nullptr dereference in _mi_page_malloc_zero(). Perhaps I have gone about solving this in a terrible way without realizing a better option.
We don't hand memory ownership over to callers so when our driver is unloading it would be an error to have outstanding allocations that have not been released back to the driver. We do track this condition, when the caller doesn't release outstanding handles back to our driver we have to disable our memory leak tracking in those cases, so I am confident we have no outstanding allocations outside our Dll.
Part of the question here, which could be helpful for me to try and help here, is if my interpretation of the above stats printout is correct. Everything is deallocated in terms of outstanding malloc requests to Mimalloc, but Mimalloc on its own has 112.4kb of committed VirtualAlloc pages and 1gb of reserved pages which should not happen? Or perhaps my understanding is not correct.
Yikes -- that sounds quite tricky.
- with the default TLS callback code, the mimalloc process done should be called after the DllMain detach event (I think?). The TLS callback codes are called by the Dll loader to (un)initialize C++ static constructors for example.
- you should definitely set
mi_option_destroy_on_exitas it destroys the arena's which is what you see. Of course, it goes wrong if theprocess_doneis called before all pointers are freed. - maybe try to (1) revert your custom changes and calling mi_win_main, and (2) set
-DMI_WIN_USE_FLS=ONwith cmake to use the FLS mechanism and see if that works?
I have always been setting mi_option_destroy_on_exit, I have tried both methods of setting it to default on in options.c or setting it with mi_option_set(). It does get the above stats to show all memory is freed but it does not cause Mimalloc to unreserve some outstanding pages.
I went back and extra confirmed the TLS callback code is causing Mimalloc to always get called before our own DllMain. Mimalloc runs its mi_process_init before our DllMain and runs its mi_process_done before our DllMain. I tried an additional setup here by leaving both mechanisms in place; both Mimalloc's TLS callbacks and my manual calls to mi_win_main() from our own DllMain. Here are the two callstacks, in order, that arrive at mi_process_done(). nvwgf2umx.dll is our driver dll in this case.
The above is a confusing so here is the call order when both Mimalloc's TLS callbacks are enabled and our driver's DllMain which we use to call mi_win_main() from.
- Process Start
- Driver dll is loaded via LoadLibrary()
- mi_win_attach() via Mimalloc TLS callback
- driver calls mi_win_main() at the beginning of its DllMain
- driver calls CRT_INIT()
- Driver dll is unloaded via UnLoadLibrary()
- mi_win_detach() via Mimalloc TLS callback
- Driver calls CRT_INIT() (de-init)
- Driver calls mi_win_main() at the end of its DllMain
- Process End
The issue in the above is Mimalloc's TLS callbacks always go before the driver's DllMain, meaning it won't wrap around CRT_INIT.
// This is the callstack for bullet 7 above
nvwgf2umx.dll!_mi_process_done() Line 682 C++
nvwgf2umx.dll!mi_win_main(void * module, unsigned long reason, void * reserved) Line 635 C++
nvwgf2umx.dll!mi_win_main_detach(void * module, unsigned long reason, void * reserved) Line 669 C++
ntdll.dll!LdrpCallInitRoutine() Unknown
ntdll.dll!LdrpCallTlsInitializers() Unknown
ntdll.dll!LdrpProcessDetachNode() Unknown
ntdll.dll!LdrpUnloadNode() Unknown
ntdll.dll!LdrpDecrementModuleLoadCountEx() Unknown
ntdll.dll!LdrUnloadDll() Unknown
KernelBase.dll!FreeLibrary() Unknown
// This is the callstack for bullet 9 above
nvwgf2umx.dll!_mi_process_done() Line 682 C++
nvwgf2umx.dll!mi_win_main(void * module, unsigned long reason, void * reserved) Line 635 C++
nvwgf2umx.dll!UnloadDriverComponents(HINSTANCE__ * hMod, unsigned long dwReason, void * lpvReserved) Line 526 C++
nvwgf2umx.dll!DllEntryPoint(HINSTANCE__ * hMod, unsigned long dwReason, void * lpvReserved) Line 566 C++
verifier.dll!AVrfpStandardDllEntryPointRoutine() Unknown
vrfcore.dll!00007ff9f532b704() Unknown
vfbasics.dll!AVrfpStandardDllEntryPointRoutine() Unknown
ntdll.dll!LdrpCallInitRoutine() Unknown
ntdll.dll!LdrpProcessDetachNode() Unknown
ntdll.dll!LdrpUnloadNode() Unknown
ntdll.dll!LdrpDecrementModuleLoadCountEx() Unknown
ntdll.dll!LdrUnloadDll() Unknown
KernelBase.dll!FreeLibrary() Unknown
Tested with MI_WIN_USE_FLS, no change in behavior.
Thanks -- hmm, I think 7 should come last; i.e. the intended sequence is:
- TLS init
- DllMain -- process attach
- CRT init
- main
- CRT done
- DllMain -- process detach
- TLS done
I'll investigate further..
Thanks daanx, we get the above by specifying /ENTRY:DllEntryPoint.
The above stuff about TLS not working as intended is interesting and worth fixing, but since we have already worked around that by manually calling mi_win_main() to get the intended effect the question still pending is why are there outstanding MEM_RESERVE pages at dll unload?
Hi -- I tried to replicate the scenario by creating a DLL that is linked statically with mimalloc, and then loading and unloading that DLL. I did not change things with TLS etc but use default behaviour. The order called is: LoadLibray:
- tls init (
mi_win_main_attach) - Dll main (attach) then:
- call
mi_option_enable(mi_option_destroy_on_exit); - allocate/free etc. FreeLibrary:
- tls done (
mi_win_main_detach) - Dll main (detach)
This is a bit unexpected, I thought dls done would be run after Dll main but it is fine anyways.In my case all arenas are destroyed and no memory is retained (you can see this in subproc_main with a zero arena count).
I tried to re-read your scenario but with the changes you made it is hard to see what goes wrong -- can you break on _mi_arenas_unsafe_destroy_all and see if they all get destroyed? (and if so, maybe break on arena_alloc to check it won't get recreated by some later allocation?).
(this was on dev2 and dev3)
I believe I have tried this with dev2 and dev3 if my understanding of Mimalloc is correct, I have tried 2.1.9 and 3.0.1. Through the course of the debug of the above TLS issues I've lost the ability to even run 3.0.1 since I tried it a few weeks ago, I'm open to whatever plan you would propose for correctly attaching Mimalloc around the CRT_INIT calls.
Anyhow to answer your question about breaking into _mi_arenas_unsafe_destroy_all, the following is from 2.1.9: I think you are correct in a failure to free, mi_os_prim_free() is taking an early out because size==0 and not calling _mi_prim_free(). I removed the size==0 condition and fixed the app verifier report of leaking the virtual reservation on 0x20000000000. Unfortunately other pages are still leaking.
The _mi_arenas_unsafe_destroy_all path identified one outstanding arena, VirtualFree'ed it, then the mi_arenas_collect finds there are no outstanding arenas:
const size_t max_arena = mi_atomic_load_acquire(&mi_arena_count); if (max_arena == 0) return;
I would believe this means the leaking reservations are not arenas and are for some other use, though _mi_segment_map_unsafe_destroy() did not find any outstanding segments either.
I got the callstack for the leaking reservation, it appears to be related to _mi_thread_heap_init() and I tracked through the mi_heap_collect() code which appears to work correctly.
nvwgf2umx.dll!win_virtual_alloc_prim_once(void * addr, unsigned __int64 size, unsigned __int64 try_alignment, unsigned long flags) Line 218 C++
nvwgf2umx.dll!win_virtual_alloc_prim(void * addr, unsigned __int64 size, unsigned __int64 try_alignment, unsigned long flags) Line 238 C++
nvwgf2umx.dll!win_virtual_alloc(void * addr, unsigned __int64 size, unsigned __int64 try_alignment, unsigned long flags, bool large_only, bool allow_large, bool * is_large) Line 289 C++
nvwgf2umx.dll!_mi_prim_alloc(void * hint_addr, unsigned __int64 size, unsigned __int64 try_alignment, bool commit, bool allow_large, bool * is_large, bool * is_zero, void * * addr) Line 302 C++
nvwgf2umx.dll!mi_os_prim_alloc_at(void * hint_addr, unsigned __int64 size, unsigned __int64 try_alignment, bool commit, bool allow_large, bool * is_large, bool * is_zero) Line 219 C++
nvwgf2umx.dll!mi_os_prim_alloc(unsigned __int64 size, unsigned __int64 try_alignment, bool commit, bool allow_large, bool * is_large, bool * is_zero) Line 243 C++
nvwgf2umx.dll!_mi_os_alloc(unsigned __int64 size, mi_memid_s * memid) Line 325 C++
nvwgf2umx.dll!mi_thread_data_zalloc() Line 340 C++
nvwgf2umx.dll!_mi_thread_heap_init() Line 401 C++
nvwgf2umx.dll!mi_thread_init() Line 520 C++
nvwgf2umx.dll!mi_heap_get_default() Line 198 C++
nvwgf2umx.dll!_mi_malloc_generic(mi_heap_s * heap, unsigned __int64 size, bool zero, unsigned __int64 huge_alignment) Line 985 C++
nvwgf2umx.dll!_mi_heap_malloc_zero_ex(mi_heap_s * heap, unsigned __int64 size, bool zero, unsigned __int64 huge_alignment) Line 188 C++
nvwgf2umx.dll!_mi_heap_malloc_zero(mi_heap_s * heap, unsigned __int64 size, bool zero) Line 208 C++
nvwgf2umx.dll!mi_heap_malloc(mi_heap_s * heap, unsigned __int64 size) Line 212 C++
nvwgf2umx.dll!mi_malloc(unsigned __int64 size) Line 216 C++
@daanx, I wanted to summarize the findings here to see if you had any final thoughts.
Two VirtualAlloc() allocations are leaking:
- One you identified as having something to do with mi_os_prim_free() taking an early out because the size==0 condition, I was able to fix this by removing that condition though I don't know the original reason for that condition and if it was more than a speed optimization to avoid a WinAPI call.
- The second leak remains unsolved and is the heap allocated by _mi_thread_heap_init(), the full callstack of its origin is in the previous post.
Do you have any thoughts on how to mitigate the second leak?