arc-swap icon indicating copy to clipboard operation
arc-swap copied to clipboard

Memory leaks?

Open Horusiath opened this issue 1 year ago • 4 comments

Recently some of the people using our library noticed that one of the parts is leaking memory. We decided to investigate it with the help mockalloc crate, which tracked the source of an issue within ArcSwap::rcu. I'm not sure if this is a false negative, however it may be worth taking look into.

#[global_allocator]
static ALLOCATOR: mockalloc::Mockalloc<std::alloc::System> = mockalloc::Mockalloc(std::alloc::System);

#[mockalloc::test]
fn memory_reclamation() {
    let arc = ArcSwapOption::new(Some(Arc::new(vec![1, 2])));
    for i in 1..2 {
        arc.rcu(|values| match values.as_deref() {
            None => None,
            Some(values) => {
                let mut values = values.clone();
                values.pop();
                if values.is_empty() {
                    None
                } else {
                    Some(Arc::new(values))
                }
            }
        });
    }
}

This test failed with following trace:


TracingInfo {
    leaks: [
        LeakInfo {
            ptr: 0x00007fa377804540,
            layout: Layout {
                size: 128,
                align: 64 (1 << 6),
            },
            caller:    0:        0x104ab9035 - <mockalloc::Mockalloc<T> as core::alloc::global::GlobalAlloc>::alloc::h0b6bbecde04abb09
               1:        0x104c6d720 - arc_swap::debt::list::Node::get::h1815a5c3e5a99577
               2:        0x104a1e66c - arc_swap::debt::list::LocalNode::with::ha273e1f8b7a3cb91
               3:        0x104a1acec - arc_swap::ArcSwapAny<T,S>::rcu::he96fd2eec60ac499
               4:        0x104ab97d1 - mockalloc::assert_allocs::had10960fa99e865f
               5:        0x104a49a8e - core::ops::function::FnOnce::call_once::h5d184419d27f0b52
               6:        0x104c5b6b2 - core::ops::function::FnOnce::call_once::hd55cd4785832579b
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
               7:        0x104c5b6b2 - test::__rust_begin_short_backtrace::h6138742d84faf9e0
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:627:18
               8:        0x104c5a5f6 - test::run_test_in_process::{{closure}}::h4bf2a1b8d80f48a4
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:650:60
               9:        0x104c5a5f6 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h2a6a680a145c7d41
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
              10:        0x104c5a5f6 - std::panicking::try::do_call::hcdf53f9d5a8b5148
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
              11:        0x104c5a5f6 - std::panicking::try::h9a226d31f3c36360
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
              12:        0x104c5a5f6 - std::panic::catch_unwind::h5b58dc40b0f29757
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
              13:        0x104c5a5f6 - test::run_test_in_process::h43aabdb785ed33c8
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:650:27
              14:        0x104c5a5f6 - test::run_test::{{closure}}::h145bd2dd944a0cb1
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:573:43
              15:        0x104c22601 - test::run_test::{{closure}}::h852234e57a56a781
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:601:41
              16:        0x104c22601 - std::sys_common::backtrace::__rust_begin_short_backtrace::h624da82491b2ecec
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:155:18
              17:        0x104c274a4 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hca064782a57d6f7c
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:529:17
              18:        0x104c274a4 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h2e1573cca99488b0
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
              19:        0x104c274a4 - std::panicking::try::do_call::h6737b46edca0dd74
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
              20:        0x104c274a4 - std::panicking::try::hfbb82914e6ff3120
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
              21:        0x104c274a4 - std::panic::catch_unwind::hcbfd11b41e609449
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
              22:        0x104c274a4 - std::thread::Builder::spawn_unchecked_::{{closure}}::h0235d2d195c46467
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:528:30
              23:        0x104c274a4 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hcc0ed71d7fcc1527
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
              24:        0x104cf69f9 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hef77fdfabbdc0490
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
              25:        0x104cf69f9 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hd4d34ecf6438f9ac
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
              26:        0x104cf69f9 - std::sys::unix::thread::Thread::new::thread_start::hcdd70219a480b010
                                           at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
              27:     0x7ff800bfb202 - __pthread_start
            ,
            free_callers: [],
        },
    ],
    errors: [],
}

called `Result::unwrap()` on an `Err` value: Leak

Horusiath avatar Apr 05 '24 06:04 Horusiath

Hello

The library uses some memory that it never frees (and the size depends on number of threads from which the library is used), but it should still be reachable and the library should be able to reuse it. In that sense, it would not be a true resource leak, but could cause tools to alert on it. It should not grow infinitely.

What exactly does the tool try to detect? Only the fact it was not freed, or also the fact it is reachable/not?

vorner avatar Apr 05 '24 08:04 vorner

The library uses some memory that it never frees (and the size depends on number of threads from which the library is used), but it should still be reachable and the library should be able to reuse it

Would it be possible for the lib to provide some sort of (probably unsafe) uninit function? This could be called when you're done with all the ArcSwap's that existed, and uninit would free every resource that's still reachable.

Or, is there any way to use the lib to ensure everything is freed and closed at the end?

Edit: I'm thinking memory, threads, files, anything that the lib might own.

xTachyon avatar May 19 '25 11:05 xTachyon

That could be a possibility, if someone took the effort to contribute one ‒ I find myself really short on free time, but I'd probably be able to drop few hints or help to get someone unstuck if they tried and hit specific problems. The library owns just memory, it doesn't start any threads by itself and is certainly too low-level to even consider the existence of any files.

However, what would needed to be ensured:

  • Some of the stuff is in thread-local storages of threads. That means, such function could be called only after all threads except the current one are shut down, because one thread can't access thread-local storage of other threads.
  • The caller would probably have to ensure no guards are in existence at that time. This one is kind of hard, because knowing none of the dependencies are hiding something in eg. global variable or thread local storage is hard.
  • I'm not sure if it could be done in a way that later second reinitialization could happen, but possibly.
  • Some of the preconditions could probably be checked opportunistically during the uninit function. That is, such checks would probably be racy and the function would still have to be unsafe, but they could try detecting violation of the them.
  • I feel like users generally should not use such function, so I'd really hate to give them the impression in documentation. So I'd probably suggest to enable such function only under a (non-default) Cargo feature. That would allow including some more tracking variables (eg. global number of active thread local storages) in a way that wouldn't impact ordinary use.

vorner avatar May 19 '25 11:05 vorner

  • Some of the stuff is in thread-local storages of threads. That means, such function could be called only after all threads except the current one are shut down, because one thread can't access thread-local storage of other threads.

That is a problem that I'll have to think about. In my usecase, threads that use this code might still be running, but I would explicitly call uninit + clear TLS variables. I don't know how this would work at the moment.

  • The caller would probably have to ensure no guards are in existence at that time. This one is kind of hard, because knowing none of the dependencies are hiding something in eg. global variable or thread local storage is hard.

Hard, but doable. In this case, I'm fine leaking memory.

  • I'm not sure if it could be done in a way that later second reinitialization could happen, but possibly.

Not a problem.

  • Some of the preconditions could probably be checked opportunistically during the uninit function. That is, such checks would probably be racy and the function would still have to be unsafe, but they could try detecting violation of the them.

That's good.

  • I feel like users generally should not use such function, so I'd really hate to give them the impression in documentation. So I'd probably suggest to enable such function only under a (non-default) Cargo feature. That would allow including some more tracking variables (eg. global number of active thread local storages) in a way that wouldn't impact ordinary use.

I have no problem with this.

if someone took the effort to contribute one

I'll think about it, thanks for the willingness to integrate this!

xTachyon avatar May 19 '25 12:05 xTachyon