async-io icon indicating copy to clipboard operation
async-io copied to clipboard

add `driver` feature

Open JayceFayne opened this issue 3 years ago • 5 comments

fixes #40

This pull adds a new feature called driver. It is enabled by default. If disabled async-io will not spawn a background thread to drive the Reactor.

Note: afaik adding this feature flag is a breaking change.

JayceFayne avatar Apr 17 '21 12:04 JayceFayne

this looks good to me, and is what I'm looking for! (apart from the merge conflict). Is there anything necessary to get this merged apart from a rebase?

bpowers avatar May 25 '21 15:05 bpowers

actually I rebased this on master: https://github.com/bpowers/async-io/tree/driver and in local testing, I end up with 2 async-io threads. I will look more later - maybe I'm not calling block_on correctly?

bpowers avatar May 25 '21 16:05 bpowers

I've been testing a fork of master with crate::driver::init() commented out. I found that my application would run normally for a few hours, then hang calling accept on an Async<TCPListener> while netstat shows the socket's Recv-Q increasing. I don't yet have a consistent repro or even a stack trace, but uncommenting crate::driver::init() fixes it. All threads are running ~~the same executor~~ (edit: one thread wasn't running the executor, which caused the race condition below) with async_io::block_on.

@bpowers Is the crates.io version being included as a transitive dependency?

edit: Noticed a possible race condition:

  1. Thread A and B are running block_on(future_a) and block_on(future_b) respectively.
  2. Thread A is running the reactor when future_b's I/O event occurs, then thread A is preempted.
  3. Thread B wakes up, polls future_b which quickly returns, then thread B parks since the reacor lock is still held by thread A.
  4. Thread A is scheduled again, it wasn't notified and has held the reactor for more than 500us, so drops the reactor lock and parks.

At this point the async-io thread would run the reactor, but with it disabled all threads are parked and the application is deadlocked.

lhallam avatar Jun 13 '21 21:06 lhallam

Here's a minimal repro:

[package]
name = "async-io-deadlock"
version = "0.1.0"
edition = "2018"

[[bin]]
name = "async-io-deadlock"
path = "main.rs"

[dependencies]
async-io = { git = "https://github.com/JayceFayne/async-io", rev="0774548", default-features = false }
futures-lite = "*"
fn main() {
    use std::time::Duration;
    use futures_lite::future;

    std::thread::spawn::<_, ()>(|| async_io::block_on(future::pending()));
    future::block_on(async {
        for i in 0..usize::MAX {
            async_io::Timer::after(Duration::from_millis(1)).await;
            println!("{}", i)
        }
    })
}

With default-features=false the program prints 0 then deadlocks:

(gdb) info threads
  Id   Target Id                                            Frame
* 1    Thread 0x7faafb0f0040 (LWP 135021) "async-io-deadlo" 0x00007faafb2d98ca in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
  2    Thread 0x7faafb0ee640 (LWP 135023) "async-io-deadlo" 0x00007faafb2d98ca in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0x00007faafb2d98ca in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
#1  0x00007faafb2d3270 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#2  0x000055f12c788aad in std::sys::unix::condvar::Condvar::wait (self=0x55f12dbc7cb0, mutex=0x55f12dbc7910) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/condvar.rs:71
#3  0x000055f12c786b9e in std::sys_common::condvar::Condvar::wait (self=0x55f12dbc7d18, mutex=0x55f12dbc7d08) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/condvar.rs:44
#4  0x000055f12c787fd1 in std::sync::condvar::Condvar::wait<()> (self=0x55f12dbc7d18, guard=...) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sync/condvar.rs:188
#5  0x000055f12c787972 in parking::Inner::park (self=0x55f12dbc7d00, timeout=...) at /home/xal/.cargo/registry/src/github.com-1ecc6299db9ec823/parking-2.0.0/src/lib.rs:305
#6  0x000055f12c78741a in parking::Parker::park (self=0x7faafb0effa0) at /home/xal/.cargo/registry/src/github.com-1ecc6299db9ec823/parking-2.0.0/src/lib.rs:103
#7  0x000055f12c741430 in futures_lite::future::block_on::{{closure}}<(),core::future::from_generator::GenFuture<generator-1>> (cache=0x7faafb0eff98)
    at /home/xal/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-lite-1.12.0/src/future.rs:91
#8  0x000055f12c73c410 in std::thread::local::LocalKey<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>>::try_with<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>,closure-0,()> (self=0x55f12c7e20f8,
    f=...) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/thread/local.rs:272
#9  0x000055f12c73be03 in std::thread::local::LocalKey<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>>::with<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>,closure-0,()> (self=0x55f12c7e20f8, f=...)
    at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/thread/local.rs:248
#10 0x000055f12c741285 in futures_lite::future::block_on<(),core::future::from_generator::GenFuture<generator-1>> (future=...) at /home/xal/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-lite-1.12.0/src/future.rs:79
#11 0x000055f12c740def in async_io_deadlock::main () at main.rs:7
(gdb) thread 2
[Switching to thread 2 (Thread 0x7faafb0ee640 (LWP 135023))]
#0  0x00007faafb2d98ca in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0x00007faafb2d98ca in __futex_abstimed_wait_common64 () from /usr/lib/libpthread.so.0
#1  0x00007faafb2d3270 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#2  0x000055f12c788aad in std::sys::unix::condvar::Condvar::wait (self=0x7faaf4000c90, mutex=0x7faaf4000c30) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/condvar.rs:71
#3  0x000055f12c786b9e in std::sys_common::condvar::Condvar::wait (self=0x7faaf4000cf8, mutex=0x7faaf4000ce8) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/condvar.rs:44
#4  0x000055f12c787fd1 in std::sync::condvar::Condvar::wait<()> (self=0x7faaf4000cf8, guard=...) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sync/condvar.rs:188
#5  0x000055f12c787972 in parking::Inner::park (self=0x7faaf4000ce0, timeout=...) at /home/xal/.cargo/registry/src/github.com-1ecc6299db9ec823/parking-2.0.0/src/lib.rs:305
#6  0x000055f12c78741a in parking::Parker::park (self=0x7faafb0ed668) at /home/xal/.cargo/registry/src/github.com-1ecc6299db9ec823/parking-2.0.0/src/lib.rs:103
#7  0x000055f12c73f4d8 in async_io::block_on::block_on<(),futures_lite::future::Pending<()>> (future=...) at /home/xal/.cargo/git/checkouts/async-io-3b3a058e40c23b8c/0774548/src/block_on.rs:138
#8  0x000055f12c740e0b in async_io_deadlock::main::{{closure}} () at main.rs:6
#9  0x000055f12c741ce9 in std::sys_common::backtrace::__rust_begin_short_backtrace<closure-0,()> (f=...) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:125
#10 0x000055f12c7405a6 in std::thread::{{impl}}::spawn_unchecked::{{closure}}::{{closure}}<closure-0,()> () at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/thread/mod.rs:474
#11 0x000055f12c73d4a9 in std::panic::{{impl}}::call_once<(),closure-0> (self=..., _args=()) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panic.rs:344
#12 0x000055f12c740afc in std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure-0>,()> (data=0x7faafb0edbe8) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:379
#13 0x000055f12c7411cd in __rust_try ()
#14 0x000055f12c740a71 in std::panicking::try<(),std::panic::AssertUnwindSafe<closure-0>> (f=...) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:343
#15 0x000055f12c73d4c9 in std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (f=...) at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panic.rs:431
#16 0x000055f12c7403da in std::thread::{{impl}}::spawn_unchecked::{{closure}}<closure-0,()> () at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/thread/mod.rs:473
#17 0x000055f12c73d54e in core::ops::function::FnOnce::call_once<closure-0,()> () at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/ops/function.rs:227
#18 0x000055f12c7a4afa in alloc::boxed::{{impl}}::call_once<(),FnOnce<()>,alloc::alloc::Global> () at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#19 alloc::boxed::{{impl}}::call_once<(),alloc::boxed::Box<FnOnce<()>, alloc::alloc::Global>,alloc::alloc::Global> () at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546
#20 std::sys::unix::thread::{{impl}}::new::thread_start () at library/std/src/sys/unix/thread.rs:71
#21 0x00007faafb2cd259 in start_thread () from /usr/lib/libpthread.so.0
#22 0x00007faafb1ef5e3 in clone () from /usr/lib/libc.so.6

lhallam avatar Jun 13 '21 22:06 lhallam

I think this feature should be inverted and not a default feature - that is, make it a "no-auto-driver-thread" feature. Alternatively, it could just be a normal non-default feature with a larger API break.

Reasoning: if a crate depends on async-io, it shouldn't need to do anything special to support being used by programs that use block_on correctly. With a default feature, a library crate that depends on async-io would need to choose one of:

  • Depend on async-io without changes, forcing the thread to be created
  • Require a default feature of their own to propagate the dependency
  • Require their callers to depend on async-io and enable the feature if they use an async api in the library crate. This may be surprising to the library's user, since they might not directly depend on async-io.

If a program does correctly use block_on, it is very likely that it will depend on the async-io crate to get that function call and so enabling the feature makes sense.

Or, instead of a feature, add a function at runtime to disable thread creation, and require a program that uses block_on properly to call it prior to making any Driver-using objects.

danieldg avatar Jul 24 '21 20:07 danieldg