[Feature] Leak memory on exit
Memory allocator is declared static and will free memory on exit when process quit gracefully:
https://github.com/ml-explore/mlx/blob/9814a2ae120385bc903d059a317233fa1be3bcef/mlx/backend/metal/allocator.cpp#L244-L247
Which can take more than 30 seconds after inferencing a LLM:
Call graph:
16392 Thread_6280670 DispatchQueue_1: com.apple.main-thread (serial)
+ 16392 start (in dyld) + 2436 [0x18da7512c]
+ 16392 dyld4::LibSystemHelpers::exit(int) const (in libdyld.dylib) + 20 [0x18de0eb80]
+ 16392 exit (in libsystem_c.dylib) + 44 [0x18dcb1d20]
+ 16392 __cxa_finalize_ranges (in libsystem_c.dylib) + 476 [0x18dcb1f98]
+ 16392 mlx::core::metal::MetalAllocator::~MetalAllocator() (in mlx.node) + 48 [0x11bee2908]
+ 16354 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache() (in mlx.node) + 112 [0x11bee1c2c]
+ ! 16318 -[AGXG15XFamilyBuffer dealloc] (in AGXMetalG15X_B0) + 76 [0x1f9bb6b98]
+ ! : 16317 -[AGXBuffer dealloc] (in AGXMetalG15X_B0) + 44 [0x1f9b8d4e4]
+ ! : | 16272 -[IOGPUMetalBuffer dealloc] (in IOGPU) + 320 [0x1ac73976c]
+ ! : | + 16076 -[IOGPUMetalResource dealloc] (in IOGPU) + 248 [0x1ac748128]
+ ! : | + ! 16032 _CFRelease (in CoreFoundation) + 292 [0x18dfa5b90]
+ ! : | + ! : 16023 ioGPUResourceFinalize (in IOGPU) + 104 [0x1ac74e6a0]
+ ! : | + ! : | 16023 iokit_user_client_trap (in IOKit) + 8 [0x19155626c]
+ ! : | + ! : 9 ioGPUResourceFinalize (in IOGPU) + 140 [0x1ac74e6c4]
+ ! : | + ! : 9 CFRelease (in CoreFoundation) + 60 [0x18de609f4]
+ ! : | + ! : 9 CF_IS_OBJC (in CoreFoundation) + 256 [0x18dfa5520]
+ ! : | + ! 27 CFRelease (in CoreFoundation) + 60 [0x18de609f4]
+ ! : | + ! : 27 CF_IS_OBJC (in CoreFoundation) + 256,196 [0x18dfa5520,0x18dfa54e4]
+ ! : | + ! 11 _CFRelease (in CoreFoundation) + 1216 [0x18dfa5f2c]
+ ! : | + ! : 6 _nanov2_free (in libsystem_malloc.dylib) + 0,188,... [0x18dc292a8,0x18dc29364,...]
+ ! : | + ! : 2 CFAllocatorDeallocate (in CoreFoundation) + 0,180 [0x18de5d14c,0x18de5d200]
+ ! : | + ! : 1 __CFAllocatorSystemDeallocate (in CoreFoundation) + 40 [0x18de5f140]
+ ! : | + ! : | 1 malloc_default_zone (in libsystem_malloc.dylib) + 0 [0x18dc0d2e4]
+ ! : | + ! : 1 malloc_zone_free (in libsystem_malloc.dylib) + 12 [0x18dc117fc]
+ ! : | + ! : 1 nanov2_free (in libsystem_malloc.dylib) + 0 [0x18dc0e5b0]
+ ! : | + ! 3 _CFRelease (in CoreFoundation) + 88,1192,... [0x18dfa5ac4,0x18dfa5f14,...]
+ ! : | + ! 2 _CFRelease (in CoreFoundation) + 1228 [0x18dfa5f38]
+ ! : | + ! : 1 CFRelease (in CoreFoundation) + 60 [0x18de609f4]
+ ! : | + ! : | 1 CF_IS_OBJC (in CoreFoundation) + 252 [0x18dfa551c]
+ ! : | + ! : 1 _CFRelease (in CoreFoundation) + 136 [0x18dfa5af4]
+ ! : | + ! 1 CFGetAllocator (in CoreFoundation) + 68 [0x18de5d0ec]
+ ! : | + 121 -[IOGPUMetalResource dealloc] (in IOGPU) + 304 [0x1ac748160]
+ ! : | + ! 119 -[_MTLObjectWithLabel dealloc] (in Metal) + 64 [0x197f8ca1c]
+ ! : | + ! : 55 _objc_rootDealloc (in libobjc.A.dylib) + 80 [0x18da2ae24]
+ ! : | + ! : | 34 objc_destructInstance (in libobjc.A.dylib) + 144 [0x18da2aebc]
+ ! : | + ! : | + 33 objc_object::clearDeallocating_slow() (in libobjc.A.dylib) + 108 [0x18da360b4]
+ ! : | + ! : | + ! 30 weak_clear_no_lock (in libobjc.A.dylib) + 48 [0x18da36184]
+ ! : | + ! : | + ! : 30 weak_entry_for_referent(weak_table_t*, objc_object*) (in libobjc.A.dylib) + 92,76,... [0x18da2e42c,0x18da2e41c,...]
+ ! : | + ! : | + ! 3 weak_clear_no_lock (in libobjc.A.dylib) + 28,48 [0x18da36170,0x18da36184]
+ ! : | + ! : | + 1 objc_object::clearDeallocating_slow() (in libobjc.A.dylib) + 56 [0x18da36080]
+ ! : | + ! : | 21 objc_destructInstance (in libobjc.A.dylib) + 80 [0x18da2ae7c]
+ ! : | + ! : | 10 object_cxxDestructFromClass(objc_object*, objc_class*) (in libobjc.A.dylib) + 116 [0x18da335f8]
+ ! : | + ! : | ! 8 -[IOGPUMetalResource .cxx_destruct] (in IOGPU) + 36 [0x1ac748a50]
+ ! : | + ! : | ! : 8 objc_destroyWeak (in libobjc.A.dylib) + 152 [0x18da345f4]
+ ! : | + ! : | ! : 7 weak_unregister_no_lock (in libobjc.A.dylib) + 128,44,... [0x18da4024c,0x18da401f8,...]
+ ! : | + ! : | ! : 1 weak_unregister_no_lock (in libobjc.A.dylib) + 40 [0x18da401f4]
+ ! : | + ! : | ! : 1 weak_entry_for_referent(weak_table_t*, objc_object*) (in libobjc.A.dylib) + 80 [0x18da2e420]
+ ! : | + ! : | ! 2 objc_destroyWeak (in libobjc.A.dylib) + 32,184 [0x18da3457c,0x18da34614]
+ ! : | + ! : | 5 object_cxxDestructFromClass(objc_object*, objc_class*) (in libobjc.A.dylib) + 56,116,... [0x18da335bc,0x18da335f8,...]
+ ! : | + ! : | 3 -[IOGPUMetalResource .cxx_destruct] (in IOGPU) + 52,56,... [0x1ac748a60,0x1ac748a64,...]
+ ! : | + ! : | 3 object_cxxDestructFromClass(objc_object*, objc_class*) (in libobjc.A.dylib) + 76 [0x18da335d0]
+ ! : | + ! : | 3 lookupMethodInClassAndLoadCache (in libobjc.A.dylib) + 52 [0x18da33488]
+ ! : | + ! : | 3 cache_getImp (in libobjc.A.dylib) + 8,48,... [0x18da29868,0x18da29890,...]
+ ! : | + ! : 30 free_tiny (in libsystem_malloc.dylib) + 100,512,... [0x18dc0fd28,0x18dc0fec4,...]
+ ! : | + ! : 23 free_tiny (in libsystem_malloc.dylib) + 496 [0x18dc0feb4]
+ ! : | + ! : | 11 tiny_free_no_lock (in libsystem_malloc.dylib) + 356,1856,... [0x18dc1019c,0x18dc10778,...]
+ ! : | + ! : | 8 tiny_free_no_lock (in libsystem_malloc.dylib) + 644 [0x18dc102bc]
+ ! : | + ! : | + 8 tiny_free_list_add_ptr (in libsystem_malloc.dylib) + 568,552,... [0x18dc10ae8,0x18dc10ad8,...]
+ ! : | + ! : | 3 tiny_free_no_lock (in libsystem_malloc.dylib) + 1376 [0x18dc10598]
+ ! : | + ! : | + 2 tiny_free_reattach_region (in libsystem_malloc.dylib) + 160,240 [0x18dc171e8,0x18dc17238]
+ ! : | + ! : | + 1 tiny_free_reattach_region (in libsystem_malloc.dylib) + 368 [0x18dc172b8]
+ ! : | + ! : | + 1 tiny_free_list_add_ptr (in libsystem_malloc.dylib) + 144 [0x18dc10940]
+ ! : | + ! : | 1 tiny_free_no_lock (in libsystem_malloc.dylib) + 436 [0x18dc101ec]
+ ! : | + ! : | 1 tiny_free_list_remove_ptr (in libsystem_malloc.dylib) + 288 [0x18dc10c44]
+ ! : | + ! : 3 _szone_free (in libsystem_malloc.dylib) + 164 [0x18dc2159c]
+ ! : | + ! : 3 nanov2_try_free_default (in libsystem_malloc.dylib) + 0 [0x18dc29604]
+ ! : | + ! : 2 _objc_rootDealloc (in libobjc.A.dylib) + 8,76 [0x18da2addc,0x18da2ae20]
+ ! : | + ! : 2 free (in libsystem_malloc.dylib) + 92,124 [0x18dc0bd1c,0x18dc0bd3c]
+ ! : | + ! : 1 _nanov2_free (in libsystem_malloc.dylib) + 368 [0x18dc29418]
+ ! : | + ! 1 -[_MTLObjectWithLabel dealloc] (in Metal) + 32 [0x197f8c9fc]
+ ! : | + ! : 1 objc_retain_x28 (in libobjc.A.dylib) + 68 [0x18da27fd0]
+ ! : | + ! 1 free_tiny (in libsystem_malloc.dylib) + 532 [0x18dc0fed8]
+ ! : | + 42 -[IOGPUMetalResource dealloc] (in IOGPU) + 216 [0x1ac748108]
+ ! : | + ! 34 objc_storeWeak (in libobjc.A.dylib) + 452 [0x18da2e184]
+ ! : | + ! : 23 weak_unregister_no_lock (in libobjc.A.dylib) + 316 [0x18da40308]
+ ! : | + ! : 11 weak_unregister_no_lock (in libobjc.A.dylib) + 40 [0x18da401f4]
+ ! : | + ! : 11 weak_entry_for_referent(weak_table_t*, objc_object*) (in libobjc.A.dylib) + 92,128 [0x18da2e42c,0x18da2e450]
+ ! : | + ! 6 objc_storeWeak (in libobjc.A.dylib) + 148 [0x18da2e054]
+ ! : | + ! : 6 locker_mixin<:lock_mixin>>::lockWith(lockdebug::lock_mixin&) (in libobjc.A.dylib) + 132,32,... [0x18da5a294,0x18da5a230,...]
+ ! : | + ! 2 objc_storeWeak (in libobjc.A.dylib) + 76,116 [0x18da2e00c,0x18da2e034]
+ ! : | + 15 -[IOGPUMetalResource dealloc] (in IOGPU) + 232 [0x1ac748118]
+ ! : | + ! 6 objc_autorelease (in libobjc.A.dylib) + 256 [0x18da2d920]
+ ! : | + ! : 6 AutoreleasePoolPage::add(objc_object*) (in libobjc.A.dylib) + 156,236,... [0x18da5b954,0x18da5b9a4,...]
+ ! : | + ! 4 objc_autorelease (in libobjc.A.dylib) + 316,308 [0x18da2d95c,0x18da2d954]
+ ! : | + ! 4 objc_loadWeak (in libobjc.A.dylib) + 24 [0x18da2e8ec]
+ ! : | + ! : 4 objc_loadWeakRetained (in libobjc.A.dylib) + 468,24 [0x18da2eae8,0x18da2e92c]
+ ! : | + ! 1 AutoreleasePoolPage::add(objc_object*) (in libobjc.A.dylib) + 164 [0x18da5b95c]
+ ! : | + 5 -[IOGPUMetalResource dealloc] (in IOGPU) + 272 [0x1ac748140]
+ ! : | + ! 3 objc_release (in libobjc.A.dylib) + 0 [0x18da27fe0]
+ ! : | + ! 2 objc_retain_x28 (in libobjc.A.dylib) + 68 [0x18da27fd0]
+ ! : | + 4 -[IOGPUMetalResource dealloc] (in IOGPU) + 260 [0x1ac748134]
+ ! : | + ! 4 objc_release (in libobjc.A.dylib) + 0,28,... [0x18da27fe0,0x18da27ffc,...]
+ ! : | + 4 -[IOGPUMetalResource dealloc] (in IOGPU) + 0,260 [0x1ac748030,0x1ac748134]
+ ! : | + 3 -[IOGPUMetalResource dealloc] (in IOGPU) + 196 [0x1ac7480f4]
+ ! : | + ! 3 -[IOGPUMemoryInfo removeResourceFromList:] (in IOGPU) + 32 [0x1ac7462d0]
+ ! : | + ! 3 os_unfair_lock_lock (in libsystem_platform.dylib) + 16,0 [0x18de22f00,0x18de22ef0]
+ ! : | + 1 -[IOGPUMetalResource dealloc] (in IOGPU) + 240 [0x1ac748120]
+ ! : | + ! 1 objc_msgSend$_removeResource: (in IOGPU) + 16 [0x1ac76f770]
+ ! : | + 1 objc_loadWeak (in libobjc.A.dylib) + 28 [0x18da2e8f0]
+ ! : | 41 -[IOGPUMetalBuffer dealloc] (in IOGPU) + 172 [0x1ac7396d8]
+ ! : | + 41 -[IOGPUMetalDevice deallocBufferSubData:heapIndex:bufferIndex:bufferOffset:length:] (in IOGPU) + 56 [0x1ac7414f8]
+ ! : | + 17 IOGPUMetalSuballocatorFree (in IOGPU) + 408,304,... [0x1ac7508ec,0x1ac750884,...]
+ ! : | + 9 IOGPUMetalSuballocatorFree (in IOGPU) + 564 [0x1ac750988]
+ ! : | + ! 9 -[AGXG15XFamilyBuffer dealloc] (in AGXMetalG15X_B0) + 76 [0x1f9bb6b98]
+ ! : | + ! 9 -[AGXBuffer dealloc] (in AGXMetalG15X_B0) + 44 [0x1f9b8d4e4]
+ ! : | + ! 9 -[IOGPUMetalBuffer dealloc] (in IOGPU) + 320 [0x1ac73976c]
+ ! : | + ! 6 -[IOGPUMetalResource dealloc] (in IOGPU) + 248 [0x1ac748128]
+ ! : | + ! : 6 _CFRelease (in CoreFoundation) + 292 [0x18dfa5b90]
+ ! : | + ! : 6 ioGPUResourceFinalize (in IOGPU) + 104 [0x1ac74e6a0]
+ ! : | + ! : 6 iokit_user_client_trap (in IOKit) + 8 [0x19155626c]
+ ! : | + ! 2 -[IOGPUMetalResource dealloc] (in IOGPU) + 216 [0x1ac748108]
+ ! : | + ! : 2 objc_storeWeak (in libobjc.A.dylib) + 452 [0x18da2e184]
+ ! : | + ! : 1 weak_unregister_no_lock (in libobjc.A.dylib) + 40 [0x18da401f4]
+ ! : | + ! : | 1 weak_entry_for_referent(weak_table_t*, objc_object*) (in libobjc.A.dylib) + 128 [0x18da2e450]
+ ! : | + ! : 1 weak_unregister_no_lock (in libobjc.A.dylib) + 16 [0x18da401dc]
+ ! : | + ! 1 -[IOGPUMetalResource dealloc] (in IOGPU) + 304 [0x1ac748160]
+ ! : | + ! 1 -[_MTLObjectWithLabel dealloc] (in Metal) + 64 [0x197f8ca1c]
+ ! : | + ! 1 _objc_rootDealloc (in libobjc.A.dylib) + 80 [0x18da2ae24]
+ ! : | + ! 1 objc_destructInstance (in libobjc.A.dylib) + 144 [0x18da2aebc]
+ ! : | + ! 1 objc_object::clearDeallocating_slow() (in libobjc.A.dylib) + 108 [0x18da360b4]
+ ! : | + ! 1 weak_clear_no_lock (in libobjc.A.dylib) + 48 [0x18da36184]
+ ! : | + ! 1 weak_entry_for_referent(weak_table_t*, objc_object*) (in libobjc.A.dylib) + 76 [0x18da2e41c]
+ ! : | + 7 IOGPUMetalSuballocatorFree (in IOGPU) + 180 [0x1ac750808]
+ ! : | + ! 7 MTLRangeAllocatorDeallocate (in Metal) + 84,112,... [0x197fc51d8,0x197fc51f4,...]
+ ! : | + 3 IOGPUMetalSuballocatorFree (in IOGPU) + 532 [0x1ac750968]
+ ! : | + ! 2 std::__tree<:__value_type int short>, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator<:__value_type int short>>>::__emplace_multi<:pair int short>>(std::pair&&) (in IOGPU) + 76,112 [0x1ac751174,0x1ac751198]
+ ! : | + ! 1 std::__tree<:__value_type int short>, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator<:__value_type int short>>>::__emplace_multi<:pair int short>>(std::pair&&) (in IOGPU) + 44 [0x1ac751154]
+ ! : | + ! 1 IOGPUMetalSuballocatorHeap::Allocator<:__tree_node int short>, void*>>::allocate(unsigned long) (in IOGPU) + 40 [0x1ac751248]
+ ! : | + ! 1 posix_memalign (in libsystem_malloc.dylib) + 40 [0x18dc13b68]
+ ! : | + ! 1 _malloc_zone_memalign (in libsystem_malloc.dylib) + 16 [0x18dc30bbc]
+ ! : | + 2 IOGPUMetalSuballocatorFree (in IOGPU) + 160 [0x1ac7507f4]
+ ! : | + ! 1 -[IOGPUMetalResource gpuAddress] (in IOGPU) + 0 [0x1ac7481a0]
+ ! : | + ! 1 objc_msgSend$gpuAddress (in IOGPU) + 0 [0x1ac770520]
+ ! : | + 2 IOGPUMetalSuballocatorFree (in IOGPU) + 472 [0x1ac75092c]
+ ! : | + ! 2 std::__tree<:__value_type int short>, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator<:__value_type int short>>>::__remove_node_pointer(std::__tree_node<:__value_type int short>, void*>*) (in IOGPU) + 100 [0x1ac750d94]
+ ! : | + ! 2 std::__tree_remove[abi:v160006]<:__tree_node_base>*>(std::__tree_node_base*, std::__tree_node_base*) (in IOGPU) + 68,88 [0x1ac750de8,0x1ac750dfc]
+ ! : | + 1 IOGPUMetalSuballocatorFree (in IOGPU) + 556 [0x1ac750980]
+ ! : | + 1 free_small (in libsystem_malloc.dylib) + 876 [0x18dc0ee60]
+ ! : | + 1 small_free_list_add_ptr (in libsystem_malloc.dylib) + 156 [0x18dc10db8]
+ ! : | 1 -[IOGPUMetalBuffer dealloc] (in IOGPU) + 104 [0x1ac739694]
+ ! : | + 1 AutoreleasePoolPage::push() (in libobjc.A.dylib) + 124 [0x18da5ca1c]
+ ! : | + 1 AutoreleasePoolPage::add(objc_object*) (in libobjc.A.dylib) + 0 [0x18da5b8b8]
+ ! : | 1 -[IOGPUMetalBuffer dealloc] (in IOGPU) + 16 [0x1ac73963c]
+ ! : | 1 -[IOGPUMetalResource dealloc] (in IOGPU) + 308 [0x1ac748164]
+ ! : | 1 objc_msgSendSuper2 (in libobjc.A.dylib) + 56 [0x18da29638]
+ ! : 1 -[IOGPUMetalBuffer dealloc] (in IOGPU) + 324 [0x1ac739770]
+ ! 27 objc_msgSend (in libobjc.A.dylib) + 0,8,... [0x18da29400,0x18da29408,...]
+ ! 9 _objc_rootRelease (in libobjc.A.dylib) + 108 [0x18da2ed3c]
+ 29 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache() (in mlx.node) + 132,168,... [0x11bee1c40,0x11bee1c64,...]
+ 9 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache() (in mlx.node) + 124 [0x11bee1c38]
+ 8 _nanov2_free (in libsystem_malloc.dylib) + 668,784,... [0x18dc29544,0x18dc295b8,...]
+ 1 free (in libsystem_malloc.dylib) + 20 [0x18dc0bcd4]
How do you think if we just leak the memory on exit to save the time?
I wonder why it's so slow / if there is a way to asynchronously free resources? Indeed I've noticed in the past that the program can take a while to exit when you are holding a lot of RAM. Leaking would be fairly easy ... just avoid releasing buffers in the buffer cache when the allocator is destroyed. Do we lose anything from doing that?
Memory allocators are slow. Doing it asynchronously does not help here because the program still has to wait for tasks to finish before exiting.
Leaking memory on exit is safe and standard behavior, for example when closing a Chrome tab all DOM elements are just leaked you can rely on OS safely collecting all the RAM used.
One danger of leaking memory on close is that in the future it makes things like ValGrind less useful when chasing a runtime memory leak.
You can tell sanitizers a specific leak is expected. Chromium explicitly leaks memory for static objects everywhere (base::NoDestructor) and still make use of various sanitizers.
I'm not opposed to this. I can mark it as an enhancement. @zcbenz I'm not sure if you are planning to send a PR. Happy to take a look if so / try it out.
I will send a PR sometime later this month if no one else had worked on it.