shredder icon indicating copy to clipboard operation
shredder copied to clipboard

Valgrind reports memory leak upon exit

Open alekratz opened this issue 4 years ago • 8 comments

I am using cargo-valgrind to get these results. I have created a brand new project using shreder = "0.1.2", and I am using this main function:

fn main() {
    shredder::run_with_gc_cleanup(|| { });
}

When I run cargo valgrind, this is what I see:

       Error Leaked 288 B
        Info at calloc (vg_replace_malloc.c:760)
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:66)
             at std::thread::Builder::spawn_unchecked (mod.rs:475)
             at std::thread::Builder::spawn (mod.rs:373)
             at std::thread::spawn (mod.rs:596)
             at shredder::collector::dropper::BackgroundDropper::new (dropper.rs:24)
             at shredder::collector::Collector::new (mod.rs:173)
             at core::ops::function::FnOnce::call_once (function.rs:227)
             at once_cell::sync::Lazy<T,F>::force::{{closure}} (lib.rs:953)
             at once_cell::sync::OnceCell<T>::get_or_init::{{closure}} (lib.rs:786)
       Error Leaked 288 B
        Info at calloc (vg_replace_malloc.c:760)
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:66)
             at std::thread::Builder::spawn_unchecked (mod.rs:475)
             at std::thread::Builder::spawn (mod.rs:373)
             at std::thread::spawn (mod.rs:596)
             at shredder::collector::Collector::new (mod.rs:190)
             at core::ops::function::FnOnce::call_once (function.rs:227)
             at once_cell::sync::Lazy<T,F>::force::{{closure}} (lib.rs:953)
             at once_cell::sync::OnceCell<T>::get_or_init::{{closure}} (lib.rs:786)
             at once_cell::imp::OnceCell<T>::initialize::{{closure}} (imp_std.rs:97)
       Error Leaked 3.4 kiB
        Info at calloc (vg_replace_malloc.c:760)
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:66)
             at std::thread::Builder::spawn_unchecked (mod.rs:475)
             at std::thread::Builder::spawn (mod.rs:373)
             at <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn (registry.rs:101)
             at rayon_core::registry::Registry::new (registry.rs:260)
             at rayon_core::registry::global_registry::{{closure}} (registry.rs:169)
             at rayon_core::registry::set_global_registry::{{closure}} (registry.rs:195)
             at std::sync::once::Once::call_once::{{closure}} (once.rs:265)
             at std::sync::once::Once::call_inner (once.rs:421)
     Summary Leaked 3.9 kiB total

I would not normally care about this, but it affects my ability to ensure that my programs are not leaking memory.

cargo-valgrind link: https://crates.io/crates/cargo-valgrind

alekratz avatar Oct 21 '20 00:10 alekratz

Well this is because the collector allocates some global datastructures that are never cleaned up. I guess this is technically “working as intended”, but I understand how it’s interfering with your usecase.

To solve this we’d need to:

  • implement a sensible drop strategy for “Collector” (including stopping the background threads and ensuring they’ve cleaned up their data)
  • stop using “OnceCell” and migrate to something that would let us drop our data (empty the lazy) (likely awkward because of synchronization concerns)

And also, even when that’s done, we’d need to cleanup the implementation of this data structure which leaks memory for safety reasons: https://github.com/Others/shredder/blob/master/src/concurrency/chunked_ll.rs

Others avatar Oct 21 '20 01:10 Others

implement a sensible drop strategy for “Collector” (including stopping the background threads and ensuring they’ve cleaned up their data)

Are these just normal threads? Do you think there could be some benefit in moving the background collector threads to use asyncio? This would be a whole separate thing, but I'm curious.

stop using “OnceCell” and migrate to something that would let us drop our data (empty the lazy) (likely awkward because of synchronization concerns)

Would this affect the Gc pointer API for making new allocations? Specifically, would we have to start supplying the collector to Gc::new(...) since I assume we'd have to get rid of the global static one? https://github.com/Others/shredder/blob/master/src/collector/mod.rs#L320

alekratz avatar Oct 21 '20 21:10 alekratz

I doubt you'd see much benefit from using asyncio for shredder. Async shines with IO workloads, shredder is all compute bound

I don't want to move away from a global collector because:

  1. It would seriously hurt ergonomics
  2. It isn't at all clear how multiple collectors could work together. (Seems like cycles across multiple collectors would be a memory leak, and a huge foot gun)

Others avatar Oct 22 '20 23:10 Others

So the logs at the top of the issue indicate this is purely a problem with shredder not cleaning up its threads at shutdown. If you allocate some actual Gcs, does the number of leaks grow?

fn main() {
    shredder::run_with_gc_cleanup(|| {
        let x = Gc::new(123);
        // don't let it get optimized out (not that I'm even sure `rustc` gets there)
        println!("{:?}, x);
    });
}

Others avatar Oct 24 '20 05:10 Others

@Others this appears to be the case. Using your example, I'm getting this output:

       Error Leaked 52 B
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)
       Error Leaked 52 B
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)
       Error Leaked 288 B
        Info at calloc (vg_replace_malloc.c:760)
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:66)
             at std::thread::Builder::spawn_unchecked (mod.rs:475)
             at std::thread::Builder::spawn (mod.rs:373)
             at std::thread::spawn (mod.rs:596)
             at shredder::collector::dropper::BackgroundDropper::new (dropper.rs:24)
             at shredder::collector::Collector::new (mod.rs:173)
             at core::ops::function::FnOnce::call_once (function.rs:227)
             at once_cell::sync::Lazy<T,F>::force::{{closure}} (lib.rs:953)
             at once_cell::sync::OnceCell<T>::get_or_init::{{closure}} (lib.rs:786)
       Error Leaked 288 B
        Info at calloc (vg_replace_malloc.c:760)
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:66)
             at std::thread::Builder::spawn_unchecked (mod.rs:475)
             at std::thread::Builder::spawn (mod.rs:373)
             at std::thread::spawn (mod.rs:596)
             at shredder::collector::Collector::new (mod.rs:190)
             at core::ops::function::FnOnce::call_once (function.rs:227)
             at once_cell::sync::Lazy<T,F>::force::{{closure}} (lib.rs:953)
             at once_cell::sync::OnceCell<T>::get_or_init::{{closure}} (lib.rs:786)
             at once_cell::imp::OnceCell<T>::initialize::{{closure}} (imp_std.rs:97)
       Error Leaked 3.4 kiB
        Info at calloc (vg_replace_malloc.c:760)
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:66)
             at std::thread::Builder::spawn_unchecked (mod.rs:475)
             at std::thread::Builder::spawn (mod.rs:373)
             at <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn (registry.rs:101)
             at rayon_core::registry::Registry::new (registry.rs:260)
             at rayon_core::registry::global_registry::{{closure}} (registry.rs:169)
             at rayon_core::registry::set_global_registry::{{closure}} (registry.rs:195)
             at std::sync::once::Once::call_once::{{closure}} (once.rs:265)
             at std::sync::once::Once::call_inner (once.rs:421)
     Summary Leaked 4.0 kiB total

There are two new 52 byte leaks appearing at the top. Out of curiosity I did all of the new allocations 10 times, and I'm seeing the leak size grow (omitting the 288B and 3.4KiB leaks for brevity, since those don't change):

       Error Leaked 520 B
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)
       Error Leaked 520 B
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)

Notice they're showing 520B leaks, corresponding to 10 * 52 = 520. This indicates to me that something is leaking in the backend even after the program supposedly cleans up.

When I run with 100 allocations, the sizes of those two leaks starts to diverge:

       Error Leaked 2.4 kiB
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)
       Error Leaked 2.6 kiB
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)

Essentially, whatever internal stuff gets allocated in a hashmap ends up not getting cleaned up.

code for the above:

fn main() {
    shredder::run_with_gc_cleanup(|| {
        for i in 0..10 {
            let x = Gc::new(i);
            println!("{:?}", x);
        }
    });
}

alekratz avatar Oct 28 '20 22:10 alekratz

This is tugging at my curiosity. I put a thread::sleep for 1 second right before exiting, and that seems to reduce some of the leaks:

       Error Leaked 2.3 kiB
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)
       Error Leaked 2.4 kiB
        Info at malloc (vg_replace_malloc.c:307)
             at alloc::alloc::alloc (alloc.rs:74)
             at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
             at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
             at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
             at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
             at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
             at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
             at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
             at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
             at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
             at dashmap::DashMap<K,V,S>::insert (lib.rs:326)

I think I'm smelling a race condition. I also tried putting a 10-second wait before exit, but it didn't change the results.

alekratz avatar Oct 28 '20 22:10 alekratz

Can you test going off the latest content on GitHub? There shouldn’t be any dash map stuff on master, maybe that would clean it up a bit?

On Wed, Oct 28, 2020 at 15:22 Alek Ratzloff [email protected] wrote:

This is tugging at my curiosity. I put a thread::sleep for 1 second right before exiting, and that seems to reduce some of the leaks:

   Error Leaked 2.3 kiB
    Info at malloc (vg_replace_malloc.c:307)
         at alloc::alloc::alloc (alloc.rs:74)
         at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
         at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
         at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
         at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
         at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
         at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
         at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
         at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
         at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
         at dashmap::DashMap<K,V,S>::insert (lib.rs:326)
   Error Leaked 2.4 kiB
    Info at malloc (vg_replace_malloc.c:307)
         at alloc::alloc::alloc (alloc.rs:74)
         at hashbrown::raw::RawTable<T>::new_uninitialized (mod.rs:411)
         at hashbrown::raw::RawTable<T>::fallible_with_capacity (mod.rs:440)
         at hashbrown::raw::RawTable<T>::resize (mod.rs:867)
         at hashbrown::raw::RawTable<T>::reserve_rehash (mod.rs:748)
         at hashbrown::raw::RawTable<T>::reserve (mod.rs:704)
         at hashbrown::raw::RawTable<T>::insert (mod.rs:920)
         at hashbrown::map::HashMap<K,V,S>::insert (map.rs:984)
         at std::collections::hash::map::HashMap<K,V,S>::insert (map.rs:795)
         at <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_insert (lib.rs:674)
         at dashmap::DashMap<K,V,S>::insert (lib.rs:326)

I think I'm smelling a race condition. I also tried putting a 10-second wait before exit, but it didn't change the results.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Others/shredder/issues/55#issuecomment-718242743, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQMMUPTDFZXJZMWD4IK66DSNCKTVANCNFSM4SZCJE5Q .

Others avatar Oct 28 '20 22:10 Others

@Others That appeared to fix it. I'm only seeing the 288B and 3.4KB leaks now, which are expected.

alekratz avatar Oct 28 '20 23:10 alekratz