mimalloc icon indicating copy to clipboard operation
mimalloc copied to clipboard

Cannot get app verifier VirtualAlloc clean despite no outstanding allocations

Open adamzl opened this issue 9 months ago • 11 comments

We are attempting to swap in Mimalloc for the CRT heap allocator for our Windows driver, unfortunately I cannot get the code app verifier clean despite my experiments with calling mi_collect(true), mi_option_destroy_on_exit, making direct calls to mi_win_main() from our DllMain, or adding a _mi_heap_collect_abandon(heap) in _mi_process_done(). I have printed the mi_stats() at the end of our DllMain() DLL_PROCESS_DETACH, which is listed below, if I am understanding this correctly there are no outstanding allocations and we have at this point done our CRT_INIT() (de-init) call so this is truly the end but there are still running threads but those were not created from our module. I think the concerning value in the below stats is the 1gb of reserved memory.

The usage scenario here is complicated, a third party application is loading a Microsoft DLL. which is then loading us. All three code owners here are creating threads though not all of them would enter our driver's code so I am unsure if Mimalloc's process and thread hooking is having issues here. Additionally our driver is loaded and unloaded multiple times in the process, although we fail at every unload including the first but I wanted to mention it because of issue #288. I have built Mimalloc both as a static lib linking it into the driver as well as built source in directly into our DLL.

Does anyone have advice on how I can get Mimalloc to unreserve all of its pages?

heap stats:     peak       total       freed     current        unit       count   
normal   6:    14.3 KiB   587.9 KiB   587.9 KiB     0          48   B      12.4 K    ok
normal   8:   695.1 KiB    52.3 MiB    52.3 MiB     0          64   B     858.4 K    ok
normal   9:   511.6 KiB    13.1 MiB    13.1 MiB     0          80   B     172.7 K    ok
normal  10:   395.2 KiB     4.1 MiB     4.1 MiB     0          96   B      45.2 K    ok
normal  11:   371.7 KiB     7.7 MiB     7.7 MiB     0         112   B      72.9 K    ok
normal  12:   877.9 KiB     6.5 MiB     6.5 MiB     0         128   B      53.7 K    ok
normal  13:     2.6 MiB   130.4 MiB   130.4 MiB     0         160   B     855.1 K    ok
normal  14:     1.3 MiB    13.0 MiB    13.0 MiB     0         192   B      71.1 K    ok
normal  15:   937.5 KiB   640.8 MiB   640.8 MiB     0         224   B       2.9 M    ok
normal  16:   706.0 KiB    20.9 MiB    20.9 MiB     0         256   B      85.6 K    ok
normal  17:   975.6 KiB     8.1 MiB     8.1 MiB     0         320   B      26.8 K    ok
normal  18:   101.6 KiB    26.4 MiB    26.4 MiB     0         384   B      72.2 K    ok
normal  19:   407.5 KiB    33.5 MiB    33.5 MiB     0         448   B      78.5 K    ok
normal  20:    53.2 KiB     1.1 MiB     1.1 MiB     0         512   B       2.3 K    ok
normal  21:     1.1 MiB    71.4 MiB    71.4 MiB     0         640   B     116.9 K    ok
normal  22:   852.3 KiB    14.2 MiB    14.2 MiB     0         768   B      19.4 K    ok
normal  23:   399.6 KiB     1.4 MiB     1.4 MiB     0         896   B       1.6 K    ok
normal  24:    87.3 KiB    27.7 MiB    27.7 MiB     0           1.0 KiB    28.3 K    ok
normal  25:   887.2 KiB    80.2 MiB    80.2 MiB     0           1.2 KiB    65.7 K    ok
normal  26:     1.0 MiB    36.3 MiB    36.3 MiB     0           1.5 KiB    24.7 K    ok
normal  27:     2.3 MiB    15.8 MiB    15.8 MiB     0           1.7 KiB     9.2 K    ok
normal  28:     1.0 MiB    17.1 MiB    17.1 MiB     0           2.0 KiB     8.7 K    ok
normal  29:     5.0 MiB    68.8 MiB    68.8 MiB     0           2.5 KiB    28.1 K    ok
normal  30:   328.2 KiB     3.9 MiB     3.9 MiB     0           3.0 KiB     1.3 K    ok
normal  31:   713.2 KiB     7.8 MiB     7.8 MiB     0           3.5 KiB     2.3 K    ok
normal  32:   437.7 KiB     1.7 MiB     1.7 MiB     0           4.0 KiB   457        ok
normal  33:   707.7 KiB    21.1 MiB    21.1 MiB     0           5.0 KiB     4.3 K    ok
normal  34:     1.1 MiB    17.0 MiB    17.0 MiB     0           6.0 KiB     2.9 K    ok
normal  35:     1.9 MiB     4.9 MiB     4.9 MiB     0           7.0 KiB   721        ok
normal  36:   642.5 KiB    20.4 MiB    20.4 MiB     0           8.0 KiB     2.6 K    ok
normal  37:   883.4 KiB    25.5 MiB    25.5 MiB     0          10.0 KiB     2.6 K    ok
normal  38:   698.7 KiB    11.2 MiB    11.2 MiB     0          12.0 KiB   958        ok
normal  39:     4.1 MiB    17.4 MiB    17.4 MiB     0          14.0 KiB     1.2 K    ok
normal  40:   835.2 KiB     2.0 MiB     2.0 MiB     0          16.0 KiB   134        ok
normal  41:     1.8 MiB    51.3 MiB    51.3 MiB     0          20.0 KiB     2.6 K    ok
normal  42:   867.3 KiB     4.9 MiB     4.9 MiB     0          24.0 KiB   212        ok
normal  43:     1.4 MiB    31.6 MiB    31.6 MiB     0          28.1 KiB     1.1 K    ok
normal  44:   514.0 KiB     1.4 MiB     1.4 MiB     0          32.1 KiB    45        ok
normal  45:     2.5 MiB    86.7 MiB    86.7 MiB     0          40.1 KiB     2.2 K    ok
normal  46:     1.0 MiB     4.2 MiB     4.2 MiB     0          48.1 KiB    90        ok
normal  47:     1.3 MiB    46.0 MiB    46.0 MiB     0          56.2 KiB   842        ok
normal  48:     1.3 MiB     6.4 MiB     6.4 MiB     0          64.2 KiB   103        ok
normal  49:     3.1 MiB   119.1 MiB   119.1 MiB     0          80.3 KiB     1.5 K    ok
normal  50:     1.6 MiB     5.1 MiB     5.1 MiB     0          96.3 KiB    55        ok
normal  51:     2.9 MiB    98.5 MiB    98.5 MiB     0         112.4 KiB   901        ok
normal  52:   514.0 KiB   642.5 KiB   642.5 KiB     0         128.5 KiB     5        ok

heap stats:     peak       total       freed     current        unit       count   
    normal:    38.7 Mi      1.7 Gi      1.7 Gi      0         335   B       5.7 M    ok
     large:    38.4 Mi    137.2 Mi    137.2 Mi      0         232.8 KiB   606        ok
      huge:     0           0           0           0                                ok
     total:    77.2 MiB     1.9 GiB     1.9 GiB     0                                ok
malloc req:    69.4 MiB     1.7 GiB     1.7 GiB     0                                ok

  reserved:     1.0 GiB     1.0 GiB     0           1.0 GiB                          
 committed:   138.9 MiB   358.8 MiB   358.7 MiB   112.4 KiB                          
     reset:     0      
    purged:   783.3 MiB
   touched:    99.0 MiB   293.9 MiB   293.9 MiB     0                                ok
  segments:    25          37          37           0                                ok
-abandoned:    13          21          21           0                                ok
   -cached:     0           0           0           0                                ok
     pages:  1000           2.1 Ki      2.1 Ki      0                                ok
-abandoned:   451         857         857           0                                ok
 -extended:     7.7 Ki 
 -noretire:   216.9 Ki 
    arenas:     1      
-crossover:     0      
 -rollback:     0      
     mmaps:    15      
   commits:     1.4 Ki 
    resets:     0      
    purges:   658      
   guarded:     0      
   threads:    14          22          22           0                                ok
  searches:     1.0 avg
numa nodes:     1
   elapsed:     3.921 s
   process: user: 19.015 s, system: 8.156 s, faults: 1102785, rss: 500.4 MiB, commit: 1.3 GiB

The App Verifier message, one of the leaked pages is the first VirtualAlloc() at 0x20000000000 which I believe is the default heap.

VERIFIER STOP 0000000000000903: pid 0x20A8: A virtual reservation was leaked. 

	0000020000000000 : Leaked reservation address.
	00000162C2562450 : Address to the allocation stack trace. Run dps <address> to view the allocation stack.
	00000162D0426FE4 : Address of the owner dll name. Run du <address> to read the dll name.
	00007FF875210000 : Base of the owner dll. Run .reload <dll_name> = <address> to reload the owner dll. Use 'lm' to get more information about the loaded and unloaded modules.

adamzl avatar Mar 21 '25 00:03 adamzl

Interesting -- I need to ponder it a bit more; but have you tried MIMALLOC_DESTROY_ON_EXIT=1 option (you can set it in the program as well). This will (I think) destroy all arenas as well which is the final allocation you see. Of course, you better be sure no one is holding on to any memory in there !

daanx avatar Mar 21 '25 02:03 daanx

I have previously tried running mi_option_set(mi_option_destroy_on_exit, true) as the first operation of our DllMain, before CRT_INIT and I tested just now doing the same via setting the default value in options.c but neither fixed the issue. I also have walked through the code and saw that _mi_process_done() is executing the if (mi_option_is_enabled(mi_option_destroy_on_exit)) condition. I also tried a suggestion from issue #288 and added _mi_heap_collect_abandon(heap) into _mi_process_done(). Nothing has worked for so far.

But there is a major caveat to the above, we get nullptr dereferences in Mimalloc code if I allow it to do its own hooking into Dll loading and unloading. The mi_tls_callback_pre code, which is wildly interesting, causes Mimalloc to always go before our own DllMain which is no good because Mimalloc inits before our CRT_INIT but then Mimalloc de-inits before our CRT(DE)INIT. To fix this I removed all of the code related to TLS callbacks and call mi_win_main() from our own DllMain() function. So for DLL_PROCESS_ATTACH and DLL_THREAD_ATTACH we call mi_win_main() before all of our code, then for the DETACH reasons mi_win_main() is called after all of our code. The intent was to fully wrap around our own code and this does fix the nullptr dereference in _mi_page_malloc_zero(). Perhaps I have gone about solving this in a terrible way without realizing a better option.

We don't hand memory ownership over to callers so when our driver is unloading it would be an error to have outstanding allocations that have not been released back to the driver. We do track this condition, when the caller doesn't release outstanding handles back to our driver we have to disable our memory leak tracking in those cases, so I am confident we have no outstanding allocations outside our Dll.

Part of the question here, which could be helpful for me to try and help here, is if my interpretation of the above stats printout is correct. Everything is deallocated in terms of outstanding malloc requests to Mimalloc, but Mimalloc on its own has 112.4kb of committed VirtualAlloc pages and 1gb of reserved pages which should not happen? Or perhaps my understanding is not correct.

adamzl avatar Mar 21 '25 04:03 adamzl

Yikes -- that sounds quite tricky.

  • with the default TLS callback code, the mimalloc process done should be called after the DllMain detach event (I think?). The TLS callback codes are called by the Dll loader to (un)initialize C++ static constructors for example.
  • you should definitely set mi_option_destroy_on_exit as it destroys the arena's which is what you see. Of course, it goes wrong if the process_done is called before all pointers are freed.
  • maybe try to (1) revert your custom changes and calling mi_win_main, and (2) set -DMI_WIN_USE_FLS=ON with cmake to use the FLS mechanism and see if that works?

daanx avatar Mar 21 '25 23:03 daanx

I have always been setting mi_option_destroy_on_exit, I have tried both methods of setting it to default on in options.c or setting it with mi_option_set(). It does get the above stats to show all memory is freed but it does not cause Mimalloc to unreserve some outstanding pages.

I went back and extra confirmed the TLS callback code is causing Mimalloc to always get called before our own DllMain. Mimalloc runs its mi_process_init before our DllMain and runs its mi_process_done before our DllMain. I tried an additional setup here by leaving both mechanisms in place; both Mimalloc's TLS callbacks and my manual calls to mi_win_main() from our own DllMain. Here are the two callstacks, in order, that arrive at mi_process_done(). nvwgf2umx.dll is our driver dll in this case.

The above is a confusing so here is the call order when both Mimalloc's TLS callbacks are enabled and our driver's DllMain which we use to call mi_win_main() from.

  1. Process Start
  2. Driver dll is loaded via LoadLibrary()
  3. mi_win_attach() via Mimalloc TLS callback
  4. driver calls mi_win_main() at the beginning of its DllMain
  5. driver calls CRT_INIT()
  6. Driver dll is unloaded via UnLoadLibrary()
  7. mi_win_detach() via Mimalloc TLS callback
  8. Driver calls CRT_INIT() (de-init)
  9. Driver calls mi_win_main() at the end of its DllMain
  10. Process End

The issue in the above is Mimalloc's TLS callbacks always go before the driver's DllMain, meaning it won't wrap around CRT_INIT.

// This is the callstack for bullet 7 above
        nvwgf2umx.dll!_mi_process_done() Line 682	C++
 	nvwgf2umx.dll!mi_win_main(void * module, unsigned long reason, void * reserved) Line 635	C++
 	nvwgf2umx.dll!mi_win_main_detach(void * module, unsigned long reason, void * reserved) Line 669	C++
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrpCallTlsInitializers()	Unknown
 	ntdll.dll!LdrpProcessDetachNode()	Unknown
 	ntdll.dll!LdrpUnloadNode()	Unknown
 	ntdll.dll!LdrpDecrementModuleLoadCountEx()	Unknown
 	ntdll.dll!LdrUnloadDll()	Unknown
 	KernelBase.dll!FreeLibrary()	Unknown

// This is the callstack for bullet 9 above
	nvwgf2umx.dll!_mi_process_done() Line 682	C++
 	nvwgf2umx.dll!mi_win_main(void * module, unsigned long reason, void * reserved) Line 635	C++
 	nvwgf2umx.dll!UnloadDriverComponents(HINSTANCE__ * hMod, unsigned long dwReason, void * lpvReserved) Line 526	C++
 	nvwgf2umx.dll!DllEntryPoint(HINSTANCE__ * hMod, unsigned long dwReason, void * lpvReserved) Line 566	C++
 	verifier.dll!AVrfpStandardDllEntryPointRoutine()	Unknown
 	vrfcore.dll!00007ff9f532b704()	Unknown
 	vfbasics.dll!AVrfpStandardDllEntryPointRoutine()	Unknown
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrpProcessDetachNode()	Unknown
 	ntdll.dll!LdrpUnloadNode()	Unknown
 	ntdll.dll!LdrpDecrementModuleLoadCountEx()	Unknown
 	ntdll.dll!LdrUnloadDll()	Unknown
 	KernelBase.dll!FreeLibrary()	Unknown

adamzl avatar Mar 24 '25 15:03 adamzl

Tested with MI_WIN_USE_FLS, no change in behavior.

adamzl avatar Mar 24 '25 20:03 adamzl

Thanks -- hmm, I think 7 should come last; i.e. the intended sequence is:

  1. TLS init
  2. DllMain -- process attach
  3. CRT init
  4. main
  5. CRT done
  6. DllMain -- process detach
  7. TLS done

I'll investigate further..

daanx avatar Mar 25 '25 23:03 daanx

Thanks daanx, we get the above by specifying /ENTRY:DllEntryPoint.

The above stuff about TLS not working as intended is interesting and worth fixing, but since we have already worked around that by manually calling mi_win_main() to get the intended effect the question still pending is why are there outstanding MEM_RESERVE pages at dll unload?

adamzl avatar Mar 25 '25 23:03 adamzl

Hi -- I tried to replicate the scenario by creating a DLL that is linked statically with mimalloc, and then loading and unloading that DLL. I did not change things with TLS etc but use default behaviour. The order called is: LoadLibray:

  • tls init (mi_win_main_attach)
  • Dll main (attach) then:
  • call mi_option_enable(mi_option_destroy_on_exit);
  • allocate/free etc. FreeLibrary:
  • tls done (mi_win_main_detach)
  • Dll main (detach)

This is a bit unexpected, I thought dls done would be run after Dll main but it is fine anyways.In my case all arenas are destroyed and no memory is retained (you can see this in subproc_main with a zero arena count). I tried to re-read your scenario but with the changes you made it is hard to see what goes wrong -- can you break on _mi_arenas_unsafe_destroy_all and see if they all get destroyed? (and if so, maybe break on arena_alloc to check it won't get recreated by some later allocation?).

(this was on dev2 and dev3)

daanx avatar Apr 01 '25 02:04 daanx

I believe I have tried this with dev2 and dev3 if my understanding of Mimalloc is correct, I have tried 2.1.9 and 3.0.1. Through the course of the debug of the above TLS issues I've lost the ability to even run 3.0.1 since I tried it a few weeks ago, I'm open to whatever plan you would propose for correctly attaching Mimalloc around the CRT_INIT calls.

Anyhow to answer your question about breaking into _mi_arenas_unsafe_destroy_all, the following is from 2.1.9: I think you are correct in a failure to free, mi_os_prim_free() is taking an early out because size==0 and not calling _mi_prim_free(). I removed the size==0 condition and fixed the app verifier report of leaking the virtual reservation on 0x20000000000. Unfortunately other pages are still leaking.

The _mi_arenas_unsafe_destroy_all path identified one outstanding arena, VirtualFree'ed it, then the mi_arenas_collect finds there are no outstanding arenas: const size_t max_arena = mi_atomic_load_acquire(&mi_arena_count); if (max_arena == 0) return;

I would believe this means the leaking reservations are not arenas and are for some other use, though _mi_segment_map_unsafe_destroy() did not find any outstanding segments either.

adamzl avatar Apr 01 '25 22:04 adamzl

I got the callstack for the leaking reservation, it appears to be related to _mi_thread_heap_init() and I tracked through the mi_heap_collect() code which appears to work correctly.

	nvwgf2umx.dll!win_virtual_alloc_prim_once(void * addr, unsigned __int64 size, unsigned __int64 try_alignment, unsigned long flags) Line 218	C++
 	nvwgf2umx.dll!win_virtual_alloc_prim(void * addr, unsigned __int64 size, unsigned __int64 try_alignment, unsigned long flags) Line 238	C++
 	nvwgf2umx.dll!win_virtual_alloc(void * addr, unsigned __int64 size, unsigned __int64 try_alignment, unsigned long flags, bool large_only, bool allow_large, bool * is_large) Line 289	C++
 	nvwgf2umx.dll!_mi_prim_alloc(void * hint_addr, unsigned __int64 size, unsigned __int64 try_alignment, bool commit, bool allow_large, bool * is_large, bool * is_zero, void * * addr) Line 302	C++
 	nvwgf2umx.dll!mi_os_prim_alloc_at(void * hint_addr, unsigned __int64 size, unsigned __int64 try_alignment, bool commit, bool allow_large, bool * is_large, bool * is_zero) Line 219	C++
 	nvwgf2umx.dll!mi_os_prim_alloc(unsigned __int64 size, unsigned __int64 try_alignment, bool commit, bool allow_large, bool * is_large, bool * is_zero) Line 243	C++
 	nvwgf2umx.dll!_mi_os_alloc(unsigned __int64 size, mi_memid_s * memid) Line 325	C++
 	nvwgf2umx.dll!mi_thread_data_zalloc() Line 340	C++
 	nvwgf2umx.dll!_mi_thread_heap_init() Line 401	C++
 	nvwgf2umx.dll!mi_thread_init() Line 520	C++
 	nvwgf2umx.dll!mi_heap_get_default() Line 198	C++
 	nvwgf2umx.dll!_mi_malloc_generic(mi_heap_s * heap, unsigned __int64 size, bool zero, unsigned __int64 huge_alignment) Line 985	C++
 	nvwgf2umx.dll!_mi_heap_malloc_zero_ex(mi_heap_s * heap, unsigned __int64 size, bool zero, unsigned __int64 huge_alignment) Line 188	C++
 	nvwgf2umx.dll!_mi_heap_malloc_zero(mi_heap_s * heap, unsigned __int64 size, bool zero) Line 208	C++
 	nvwgf2umx.dll!mi_heap_malloc(mi_heap_s * heap, unsigned __int64 size) Line 212	C++
 	nvwgf2umx.dll!mi_malloc(unsigned __int64 size) Line 216	C++

adamzl avatar Apr 01 '25 22:04 adamzl

@daanx, I wanted to summarize the findings here to see if you had any final thoughts.

Two VirtualAlloc() allocations are leaking:

  • One you identified as having something to do with mi_os_prim_free() taking an early out because the size==0 condition, I was able to fix this by removing that condition though I don't know the original reason for that condition and if it was more than a speed optimization to avoid a WinAPI call.
  • The second leak remains unsolved and is the heap allocated by _mi_thread_heap_init(), the full callstack of its origin is in the previous post.

Do you have any thoughts on how to mitigate the second leak?

adamzl avatar Apr 15 '25 22:04 adamzl