mlx icon indicating copy to clipboard operation
mlx copied to clipboard

[Feature] Leak memory on exit

Open zcbenz opened this issue 1 year ago • 6 comments

Memory allocator is declared static and will free memory on exit when process quit gracefully:

https://github.com/ml-explore/mlx/blob/9814a2ae120385bc903d059a317233fa1be3bcef/mlx/backend/metal/allocator.cpp#L244-L247

Which can take more than 30 seconds after inferencing a LLM:

Call graph:
    16392 Thread_6280670   DispatchQueue_1: com.apple.main-thread  (serial)
    + 16392 start  (in dyld) + 2436  [0x18da7512c]
    +   16392 dyld4::LibSystemHelpers::exit(int) const  (in libdyld.dylib) + 20  [0x18de0eb80]
    +     16392 exit  (in libsystem_c.dylib) + 44  [0x18dcb1d20]
    +       16392 __cxa_finalize_ranges  (in libsystem_c.dylib) + 476  [0x18dcb1f98]
    +         16392 mlx::core::metal::MetalAllocator::~MetalAllocator()  (in mlx.node) + 48  [0x11bee2908]
    +           16354 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache()  (in mlx.node) + 112  [0x11bee1c2c]
    +           ! 16318 -[AGXG15XFamilyBuffer dealloc]  (in AGXMetalG15X_B0) + 76  [0x1f9bb6b98]
    +           ! : 16317 -[AGXBuffer dealloc]  (in AGXMetalG15X_B0) + 44  [0x1f9b8d4e4]
    +           ! : | 16272 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 320  [0x1ac73976c]
    +           ! : | + 16076 -[IOGPUMetalResource dealloc]  (in IOGPU) + 248  [0x1ac748128]
    +           ! : | + ! 16032 _CFRelease  (in CoreFoundation) + 292  [0x18dfa5b90]
    +           ! : | + ! : 16023 ioGPUResourceFinalize  (in IOGPU) + 104  [0x1ac74e6a0]
    +           ! : | + ! : | 16023 iokit_user_client_trap  (in IOKit) + 8  [0x19155626c]
    +           ! : | + ! : 9 ioGPUResourceFinalize  (in IOGPU) + 140  [0x1ac74e6c4]
    +           ! : | + ! :   9 CFRelease  (in CoreFoundation) + 60  [0x18de609f4]
    +           ! : | + ! :     9 CF_IS_OBJC  (in CoreFoundation) + 256  [0x18dfa5520]
    +           ! : | + ! 27 CFRelease  (in CoreFoundation) + 60  [0x18de609f4]
    +           ! : | + ! : 27 CF_IS_OBJC  (in CoreFoundation) + 256,196  [0x18dfa5520,0x18dfa54e4]
    +           ! : | + ! 11 _CFRelease  (in CoreFoundation) + 1216  [0x18dfa5f2c]
    +           ! : | + ! : 6 _nanov2_free  (in libsystem_malloc.dylib) + 0,188,...  [0x18dc292a8,0x18dc29364,...]
    +           ! : | + ! : 2 CFAllocatorDeallocate  (in CoreFoundation) + 0,180  [0x18de5d14c,0x18de5d200]
    +           ! : | + ! : 1 __CFAllocatorSystemDeallocate  (in CoreFoundation) + 40  [0x18de5f140]
    +           ! : | + ! : | 1 malloc_default_zone  (in libsystem_malloc.dylib) + 0  [0x18dc0d2e4]
    +           ! : | + ! : 1 malloc_zone_free  (in libsystem_malloc.dylib) + 12  [0x18dc117fc]
    +           ! : | + ! : 1 nanov2_free  (in libsystem_malloc.dylib) + 0  [0x18dc0e5b0]
    +           ! : | + ! 3 _CFRelease  (in CoreFoundation) + 88,1192,...  [0x18dfa5ac4,0x18dfa5f14,...]
    +           ! : | + ! 2 _CFRelease  (in CoreFoundation) + 1228  [0x18dfa5f38]
    +           ! : | + ! : 1 CFRelease  (in CoreFoundation) + 60  [0x18de609f4]
    +           ! : | + ! : | 1 CF_IS_OBJC  (in CoreFoundation) + 252  [0x18dfa551c]
    +           ! : | + ! : 1 _CFRelease  (in CoreFoundation) + 136  [0x18dfa5af4]
    +           ! : | + ! 1 CFGetAllocator  (in CoreFoundation) + 68  [0x18de5d0ec]
    +           ! : | + 121 -[IOGPUMetalResource dealloc]  (in IOGPU) + 304  [0x1ac748160]
    +           ! : | + ! 119 -[_MTLObjectWithLabel dealloc]  (in Metal) + 64  [0x197f8ca1c]
    +           ! : | + ! : 55 _objc_rootDealloc  (in libobjc.A.dylib) + 80  [0x18da2ae24]
    +           ! : | + ! : | 34 objc_destructInstance  (in libobjc.A.dylib) + 144  [0x18da2aebc]
    +           ! : | + ! : | + 33 objc_object::clearDeallocating_slow()  (in libobjc.A.dylib) + 108  [0x18da360b4]
    +           ! : | + ! : | + ! 30 weak_clear_no_lock  (in libobjc.A.dylib) + 48  [0x18da36184]
    +           ! : | + ! : | + ! : 30 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 92,76,...  [0x18da2e42c,0x18da2e41c,...]
    +           ! : | + ! : | + ! 3 weak_clear_no_lock  (in libobjc.A.dylib) + 28,48  [0x18da36170,0x18da36184]
    +           ! : | + ! : | + 1 objc_object::clearDeallocating_slow()  (in libobjc.A.dylib) + 56  [0x18da36080]
    +           ! : | + ! : | 21 objc_destructInstance  (in libobjc.A.dylib) + 80  [0x18da2ae7c]
    +           ! : | + ! : |   10 object_cxxDestructFromClass(objc_object*, objc_class*)  (in libobjc.A.dylib) + 116  [0x18da335f8]
    +           ! : | + ! : |   ! 8 -[IOGPUMetalResource .cxx_destruct]  (in IOGPU) + 36  [0x1ac748a50]
    +           ! : | + ! : |   ! : 8 objc_destroyWeak  (in libobjc.A.dylib) + 152  [0x18da345f4]
    +           ! : | + ! : |   ! :   7 weak_unregister_no_lock  (in libobjc.A.dylib) + 128,44,...  [0x18da4024c,0x18da401f8,...]
    +           ! : | + ! : |   ! :   1 weak_unregister_no_lock  (in libobjc.A.dylib) + 40  [0x18da401f4]
    +           ! : | + ! : |   ! :     1 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 80  [0x18da2e420]
    +           ! : | + ! : |   ! 2 objc_destroyWeak  (in libobjc.A.dylib) + 32,184  [0x18da3457c,0x18da34614]
    +           ! : | + ! : |   5 object_cxxDestructFromClass(objc_object*, objc_class*)  (in libobjc.A.dylib) + 56,116,...  [0x18da335bc,0x18da335f8,...]
    +           ! : | + ! : |   3 -[IOGPUMetalResource .cxx_destruct]  (in IOGPU) + 52,56,...  [0x1ac748a60,0x1ac748a64,...]
    +           ! : | + ! : |   3 object_cxxDestructFromClass(objc_object*, objc_class*)  (in libobjc.A.dylib) + 76  [0x18da335d0]
    +           ! : | + ! : |     3 lookupMethodInClassAndLoadCache  (in libobjc.A.dylib) + 52  [0x18da33488]
    +           ! : | + ! : |       3 cache_getImp  (in libobjc.A.dylib) + 8,48,...  [0x18da29868,0x18da29890,...]
    +           ! : | + ! : 30 free_tiny  (in libsystem_malloc.dylib) + 100,512,...  [0x18dc0fd28,0x18dc0fec4,...]
    +           ! : | + ! : 23 free_tiny  (in libsystem_malloc.dylib) + 496  [0x18dc0feb4]
    +           ! : | + ! : | 11 tiny_free_no_lock  (in libsystem_malloc.dylib) + 356,1856,...  [0x18dc1019c,0x18dc10778,...]
    +           ! : | + ! : | 8 tiny_free_no_lock  (in libsystem_malloc.dylib) + 644  [0x18dc102bc]
    +           ! : | + ! : | + 8 tiny_free_list_add_ptr  (in libsystem_malloc.dylib) + 568,552,...  [0x18dc10ae8,0x18dc10ad8,...]
    +           ! : | + ! : | 3 tiny_free_no_lock  (in libsystem_malloc.dylib) + 1376  [0x18dc10598]
    +           ! : | + ! : | + 2 tiny_free_reattach_region  (in libsystem_malloc.dylib) + 160,240  [0x18dc171e8,0x18dc17238]
    +           ! : | + ! : | + 1 tiny_free_reattach_region  (in libsystem_malloc.dylib) + 368  [0x18dc172b8]
    +           ! : | + ! : | +   1 tiny_free_list_add_ptr  (in libsystem_malloc.dylib) + 144  [0x18dc10940]
    +           ! : | + ! : | 1 tiny_free_no_lock  (in libsystem_malloc.dylib) + 436  [0x18dc101ec]
    +           ! : | + ! : |   1 tiny_free_list_remove_ptr  (in libsystem_malloc.dylib) + 288  [0x18dc10c44]
    +           ! : | + ! : 3 _szone_free  (in libsystem_malloc.dylib) + 164  [0x18dc2159c]
    +           ! : | + ! : 3 nanov2_try_free_default  (in libsystem_malloc.dylib) + 0  [0x18dc29604]
    +           ! : | + ! : 2 _objc_rootDealloc  (in libobjc.A.dylib) + 8,76  [0x18da2addc,0x18da2ae20]
    +           ! : | + ! : 2 free  (in libsystem_malloc.dylib) + 92,124  [0x18dc0bd1c,0x18dc0bd3c]
    +           ! : | + ! : 1 _nanov2_free  (in libsystem_malloc.dylib) + 368  [0x18dc29418]
    +           ! : | + ! 1 -[_MTLObjectWithLabel dealloc]  (in Metal) + 32  [0x197f8c9fc]
    +           ! : | + ! : 1 objc_retain_x28  (in libobjc.A.dylib) + 68  [0x18da27fd0]
    +           ! : | + ! 1 free_tiny  (in libsystem_malloc.dylib) + 532  [0x18dc0fed8]
    +           ! : | + 42 -[IOGPUMetalResource dealloc]  (in IOGPU) + 216  [0x1ac748108]
    +           ! : | + ! 34 objc_storeWeak  (in libobjc.A.dylib) + 452  [0x18da2e184]
    +           ! : | + ! : 23 weak_unregister_no_lock  (in libobjc.A.dylib) + 316  [0x18da40308]
    +           ! : | + ! : 11 weak_unregister_no_lock  (in libobjc.A.dylib) + 40  [0x18da401f4]
    +           ! : | + ! :   11 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 92,128  [0x18da2e42c,0x18da2e450]
    +           ! : | + ! 6 objc_storeWeak  (in libobjc.A.dylib) + 148  [0x18da2e054]
    +           ! : | + ! : 6 locker_mixin<:lock_mixin>>::lockWith(lockdebug::lock_mixin&)  (in libobjc.A.dylib) + 132,32,...  [0x18da5a294,0x18da5a230,...]
    +           ! : | + ! 2 objc_storeWeak  (in libobjc.A.dylib) + 76,116  [0x18da2e00c,0x18da2e034]
    +           ! : | + 15 -[IOGPUMetalResource dealloc]  (in IOGPU) + 232  [0x1ac748118]
    +           ! : | + ! 6 objc_autorelease  (in libobjc.A.dylib) + 256  [0x18da2d920]
    +           ! : | + ! : 6 AutoreleasePoolPage::add(objc_object*)  (in libobjc.A.dylib) + 156,236,...  [0x18da5b954,0x18da5b9a4,...]
    +           ! : | + ! 4 objc_autorelease  (in libobjc.A.dylib) + 316,308  [0x18da2d95c,0x18da2d954]
    +           ! : | + ! 4 objc_loadWeak  (in libobjc.A.dylib) + 24  [0x18da2e8ec]
    +           ! : | + ! : 4 objc_loadWeakRetained  (in libobjc.A.dylib) + 468,24  [0x18da2eae8,0x18da2e92c]
    +           ! : | + ! 1 AutoreleasePoolPage::add(objc_object*)  (in libobjc.A.dylib) + 164  [0x18da5b95c]
    +           ! : | + 5 -[IOGPUMetalResource dealloc]  (in IOGPU) + 272  [0x1ac748140]
    +           ! : | + ! 3 objc_release  (in libobjc.A.dylib) + 0  [0x18da27fe0]
    +           ! : | + ! 2 objc_retain_x28  (in libobjc.A.dylib) + 68  [0x18da27fd0]
    +           ! : | + 4 -[IOGPUMetalResource dealloc]  (in IOGPU) + 260  [0x1ac748134]
    +           ! : | + ! 4 objc_release  (in libobjc.A.dylib) + 0,28,...  [0x18da27fe0,0x18da27ffc,...]
    +           ! : | + 4 -[IOGPUMetalResource dealloc]  (in IOGPU) + 0,260  [0x1ac748030,0x1ac748134]
    +           ! : | + 3 -[IOGPUMetalResource dealloc]  (in IOGPU) + 196  [0x1ac7480f4]
    +           ! : | + ! 3 -[IOGPUMemoryInfo removeResourceFromList:]  (in IOGPU) + 32  [0x1ac7462d0]
    +           ! : | + !   3 os_unfair_lock_lock  (in libsystem_platform.dylib) + 16,0  [0x18de22f00,0x18de22ef0]
    +           ! : | + 1 -[IOGPUMetalResource dealloc]  (in IOGPU) + 240  [0x1ac748120]
    +           ! : | + ! 1 objc_msgSend$_removeResource:  (in IOGPU) + 16  [0x1ac76f770]
    +           ! : | + 1 objc_loadWeak  (in libobjc.A.dylib) + 28  [0x18da2e8f0]
    +           ! : | 41 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 172  [0x1ac7396d8]
    +           ! : | + 41 -[IOGPUMetalDevice deallocBufferSubData:heapIndex:bufferIndex:bufferOffset:length:]  (in IOGPU) + 56  [0x1ac7414f8]
    +           ! : | +   17 IOGPUMetalSuballocatorFree  (in IOGPU) + 408,304,...  [0x1ac7508ec,0x1ac750884,...]
    +           ! : | +   9 IOGPUMetalSuballocatorFree  (in IOGPU) + 564  [0x1ac750988]
    +           ! : | +   ! 9 -[AGXG15XFamilyBuffer dealloc]  (in AGXMetalG15X_B0) + 76  [0x1f9bb6b98]
    +           ! : | +   !   9 -[AGXBuffer dealloc]  (in AGXMetalG15X_B0) + 44  [0x1f9b8d4e4]
    +           ! : | +   !     9 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 320  [0x1ac73976c]
    +           ! : | +   !       6 -[IOGPUMetalResource dealloc]  (in IOGPU) + 248  [0x1ac748128]
    +           ! : | +   !       : 6 _CFRelease  (in CoreFoundation) + 292  [0x18dfa5b90]
    +           ! : | +   !       :   6 ioGPUResourceFinalize  (in IOGPU) + 104  [0x1ac74e6a0]
    +           ! : | +   !       :     6 iokit_user_client_trap  (in IOKit) + 8  [0x19155626c]
    +           ! : | +   !       2 -[IOGPUMetalResource dealloc]  (in IOGPU) + 216  [0x1ac748108]
    +           ! : | +   !       : 2 objc_storeWeak  (in libobjc.A.dylib) + 452  [0x18da2e184]
    +           ! : | +   !       :   1 weak_unregister_no_lock  (in libobjc.A.dylib) + 40  [0x18da401f4]
    +           ! : | +   !       :   | 1 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 128  [0x18da2e450]
    +           ! : | +   !       :   1 weak_unregister_no_lock  (in libobjc.A.dylib) + 16  [0x18da401dc]
    +           ! : | +   !       1 -[IOGPUMetalResource dealloc]  (in IOGPU) + 304  [0x1ac748160]
    +           ! : | +   !         1 -[_MTLObjectWithLabel dealloc]  (in Metal) + 64  [0x197f8ca1c]
    +           ! : | +   !           1 _objc_rootDealloc  (in libobjc.A.dylib) + 80  [0x18da2ae24]
    +           ! : | +   !             1 objc_destructInstance  (in libobjc.A.dylib) + 144  [0x18da2aebc]
    +           ! : | +   !               1 objc_object::clearDeallocating_slow()  (in libobjc.A.dylib) + 108  [0x18da360b4]
    +           ! : | +   !                 1 weak_clear_no_lock  (in libobjc.A.dylib) + 48  [0x18da36184]
    +           ! : | +   !                   1 weak_entry_for_referent(weak_table_t*, objc_object*)  (in libobjc.A.dylib) + 76  [0x18da2e41c]
    +           ! : | +   7 IOGPUMetalSuballocatorFree  (in IOGPU) + 180  [0x1ac750808]
    +           ! : | +   ! 7 MTLRangeAllocatorDeallocate  (in Metal) + 84,112,...  [0x197fc51d8,0x197fc51f4,...]
    +           ! : | +   3 IOGPUMetalSuballocatorFree  (in IOGPU) + 532  [0x1ac750968]
    +           ! : | +   ! 2 std::__tree<:__value_type int short>, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator<:__value_type int short>>>::__emplace_multi<:pair int short>>(std::pair&&)  (in IOGPU) + 76,112  [0x1ac751174,0x1ac751198]
    +           ! : | +   ! 1 std::__tree<:__value_type int short>, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator<:__value_type int short>>>::__emplace_multi<:pair int short>>(std::pair&&)  (in IOGPU) + 44  [0x1ac751154]
    +           ! : | +   !   1 IOGPUMetalSuballocatorHeap::Allocator<:__tree_node int short>, void*>>::allocate(unsigned long)  (in IOGPU) + 40  [0x1ac751248]
    +           ! : | +   !     1 posix_memalign  (in libsystem_malloc.dylib) + 40  [0x18dc13b68]
    +           ! : | +   !       1 _malloc_zone_memalign  (in libsystem_malloc.dylib) + 16  [0x18dc30bbc]
    +           ! : | +   2 IOGPUMetalSuballocatorFree  (in IOGPU) + 160  [0x1ac7507f4]
    +           ! : | +   ! 1 -[IOGPUMetalResource gpuAddress]  (in IOGPU) + 0  [0x1ac7481a0]
    +           ! : | +   ! 1 objc_msgSend$gpuAddress  (in IOGPU) + 0  [0x1ac770520]
    +           ! : | +   2 IOGPUMetalSuballocatorFree  (in IOGPU) + 472  [0x1ac75092c]
    +           ! : | +   ! 2 std::__tree<:__value_type int short>, std::__map_value_compare, true>, IOGPUMetalSuballocatorHeap::Allocator<:__value_type int short>>>::__remove_node_pointer(std::__tree_node<:__value_type int short>, void*>*)  (in IOGPU) + 100  [0x1ac750d94]
    +           ! : | +   !   2 std::__tree_remove[abi:v160006]<:__tree_node_base>*>(std::__tree_node_base*, std::__tree_node_base*)  (in IOGPU) + 68,88  [0x1ac750de8,0x1ac750dfc]
    +           ! : | +   1 IOGPUMetalSuballocatorFree  (in IOGPU) + 556  [0x1ac750980]
    +           ! : | +     1 free_small  (in libsystem_malloc.dylib) + 876  [0x18dc0ee60]
    +           ! : | +       1 small_free_list_add_ptr  (in libsystem_malloc.dylib) + 156  [0x18dc10db8]
    +           ! : | 1 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 104  [0x1ac739694]
    +           ! : | + 1 AutoreleasePoolPage::push()  (in libobjc.A.dylib) + 124  [0x18da5ca1c]
    +           ! : | +   1 AutoreleasePoolPage::add(objc_object*)  (in libobjc.A.dylib) + 0  [0x18da5b8b8]
    +           ! : | 1 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 16  [0x1ac73963c]
    +           ! : | 1 -[IOGPUMetalResource dealloc]  (in IOGPU) + 308  [0x1ac748164]
    +           ! : | 1 objc_msgSendSuper2  (in libobjc.A.dylib) + 56  [0x18da29638]
    +           ! : 1 -[IOGPUMetalBuffer dealloc]  (in IOGPU) + 324  [0x1ac739770]
    +           ! 27 objc_msgSend  (in libobjc.A.dylib) + 0,8,...  [0x18da29400,0x18da29408,...]
    +           ! 9 _objc_rootRelease  (in libobjc.A.dylib) + 108  [0x18da2ed3c]
    +           29 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache()  (in mlx.node) + 132,168,...  [0x11bee1c40,0x11bee1c64,...]
    +           9 mlx::core::metal::(anonymous namespace)::BufferCache::~BufferCache()  (in mlx.node) + 124  [0x11bee1c38]
    +             8 _nanov2_free  (in libsystem_malloc.dylib) + 668,784,...  [0x18dc29544,0x18dc295b8,...]
    +             1 free  (in libsystem_malloc.dylib) + 20  [0x18dc0bcd4]

How do you think if we just leak the memory on exit to save the time?

zcbenz avatar May 07 '24 00:05 zcbenz

I wonder why it's so slow / if there is a way to asynchronously free resources? Indeed I've noticed in the past that the program can take a while to exit when you are holding a lot of RAM. Leaking would be fairly easy ... just avoid releasing buffers in the buffer cache when the allocator is destroyed. Do we lose anything from doing that?

awni avatar May 08 '24 13:05 awni

Memory allocators are slow. Doing it asynchronously does not help here because the program still has to wait for tasks to finish before exiting.

Leaking memory on exit is safe and standard behavior, for example when closing a Chrome tab all DOM elements are just leaked you can rely on OS safely collecting all the RAM used.

zcbenz avatar May 08 '24 23:05 zcbenz

One danger of leaking memory on close is that in the future it makes things like ValGrind less useful when chasing a runtime memory leak.

madrob avatar May 09 '24 00:05 madrob

You can tell sanitizers a specific leak is expected. Chromium explicitly leaks memory for static objects everywhere (base::NoDestructor) and still make use of various sanitizers.

zcbenz avatar May 09 '24 01:05 zcbenz

I'm not opposed to this. I can mark it as an enhancement. @zcbenz I'm not sure if you are planning to send a PR. Happy to take a look if so / try it out.

awni avatar May 10 '24 03:05 awni

I will send a PR sometime later this month if no one else had worked on it.

zcbenz avatar May 10 '24 12:05 zcbenz