async-std Not joining all threads = memory leaked on process exit

Hi. I'm implementing a library at a low level which allocates and frees memory manually. To try to keep it correct I run example binaries and tests under Valgrind a lot. However, when using async-std my programs and tests very often seem to not correctly join threads before exiting, resulting in leaks that make Valgrind sad and my automated tests fail. I experience this on Linux and have not tried it elsewhere.

A simple failing test. Placed under tests/all_tests.rs:

#[async_std::test]
async fn just_sleep() {
    async_std::task::sleep(std::time::Duration::from_millis(1)).await;
}

You can run it under vanilla Valgrind with valgrind --leak-check=full ./target/debug/whatever_the_filename_is. But cargo valgrind has prettier output. Due to the racy nature of this it might not always fail, but for me it does ~90% of the runs:

$ cargo valgrind --test all_tests
....
       Error Leaked 24 B
        Info at malloc (vg_replace_malloc.c:309)
             at alloc::alloc::alloc (alloc.rs:84)
             at alloc::alloc::exchange_malloc (alloc.rs:206)
             at alloc::sync::Arc<T>::new (sync.rs:302)
             at futures_timer::global::current_thread_waker (global.rs:104)
             at futures_timer::global::run (global.rs:59)
             at futures_timer::global::HelperThread::new::{{closure}} (global.rs:28)
             at std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:126)
             at std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} (mod.rs:470)
             at <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (panic.rs:315)
             at std::panicking::try::do_call (panicking.rs:292)
             at __rust_maybe_catch_panic (lib.rs:80)
       Error Leaked 288 B
        Info at calloc (vg_replace_malloc.c:762)
             at allocate_dtv
             at _dl_allocate_tls
             at pthread_create@@GLIBC_2.2.5
             at std::sys::unix::thread::Thread::new (thread.rs:67)
             at std::thread::Builder::spawn_unchecked (mod.rs:489)
             at std::thread::Builder::spawn (mod.rs:382)
             at futures_timer::global::HelperThread::new (global.rs:26)
             at <futures_timer::timer::TimerHandle as core::default::Default>::default (timer.rs:281)
             at futures_timer::delay::Delay::new (delay.rs:39)
             at async_std::io::timeout::timeout::{{closure}} (timeout.rs:40)
             at <std::future::GenFuture<T> as core::future::future::Future>::poll::{{closure}} (future.rs:43)
     Summary Leaked 312 B total

A very similar leak stack trace can be obtained from the following test:

#[test]
fn just_sync_sleep() {
    std::thread::spawn(move || {
        std::thread::sleep(std::time::Duration::from_millis(1));
    });
}

Which is fixed by joining all threads before exiting:

#[test]
fn just_sync_sleep() {
    let t = std::thread::spawn(move || {
        std::thread::sleep(std::time::Duration::from_millis(1));
    });
    t.join()
}

This leads me to suspect the #[async_std::main] and #[async_std::test] macros don't properly join all threads before exiting.

Apr 29 '20 18:04 faern

Thank you for the report, two asks (1) could you test this with https://github.com/async-rs/async-std/pull/757 (2) is there an easy way for me to run the tests that you are experencing this with?

May 02 '20 18:05 dignifiedquire

I just realize, that this does not happen in #757 either, so no need to test this. Shutting down the threads could be possibly added to the macros, but it would have to be a manual call for other use cases, as we create a global runtime, and so do not get the benefit of using Drop like tokio does, where the user manages a local runtime.

May 02 '20 18:05 dignifiedquire

Yes, #757 does not solve the leak.

I created a repository to easily test this and to see some differences: https://github.com/faern/leakruntime. This repository currently points to the smol branch of async-std.

One interesting thing I noted is that #[tokio::main] does actually cause a single thread leak sometimes. And #[tokio::test] as well, just very rarely. I had to run cargo valgrind on my example repo in a loop to eventually trigger it:

while cargo +1.39.0 valgrind --test tokio --features tokio; do true; done

async-std on the other hand leak many objects both from #[async_std::main] and #[async_std::test] every time. The results I got can be viewed here: https://raw.githubusercontent.com/faern/leakruntime/master/async-std-main-leak.log and https://raw.githubusercontent.com/faern/leakruntime/master/async-std-test-leak.log

May 03 '20 11:05 faern

Yes the difference in what the macros expand to is shown in the main.rs file in that repo. The main difference being that tokio creates a Runtime blocks on it and then drops it. While async-std just blocks on a global runtime. And I assume nothing is responsible for shutting that runtime down before the process exits.

May 03 '20 11:05 faern

The very rare leak found in #[tokio::test] does exist for a normal sync test looking like this as well:

#[test]
fn just_sleep() {
    std::thread::sleep(std::time::Duration::from_millis(1));
}

So that one can't be blamed on tokio. One just have to run it sufficiently many times to hit it. I think I will report that directly to Rust. But it's unrelated to all the leaks this issue is about.

May 03 '20 11:05 faern

async-std spawns global runtime threads for executors, timers and potentially IO in the background which will simply run forever. So the leak seems kind of by design.

May 05 '20 04:05 Matthias247

Since it's possible to interact with the global runtime via for example async_std::task::spawn I guess there could just as well be a async_std::shutdown function that cancels and frees all resources of the global runtime. It could then expand the #[async_std::main] macro to end main with this call. For tests it would be harder since there can be multiple tests. But doable I think

I get that there are a lot of users who don't care about this, and the OS will clean up stuff as the process exits anyway. So it's up to you if you prioritize this at all. I just wanted to report my findings.

May 05 '20 05:05 faern

I seem to have the same issue, but I didn't have it in v1.5.0. It only shows up in v1.6.2. FWIW, also happens if I set RUST_TEST_THREADS=1.

Jun 25 '20 16:06 hu55a1n1

async-std async-std copied to clipboard

Not joining all threads = memory leaked on process exit

async-std
async-std copied to clipboard