blocking icon indicating copy to clipboard operation
blocking copied to clipboard

Recover from thread spawn failure

Open mleonhard opened this issue 3 years ago • 1 comments

Thanks for making this library. I'm using it as a dependency of Tide: tide -> async-std -> async-global-executor -> blocking I'm running an API server in a VM which has limitations on number of threads (Heroku).

When the blocking library tries to spawn a new thread and fails, it poisons an internal mutex. Then all future calls into the threadpool immediately panic.

This behavior makes blocking fragile. Every server that depends on it is fragile. Thread pool limits are insiffucient protection. VMs often have other processes running that consume threads: cron, backups, ssh, deployments, etc. I need my API server to be resilient. I need it to stay up after failing to allocate a thread.

log.txt
{
  "time_ns": 1646202723428289154,
  "time": "2022-03-02T06:32:03.428Z",
  "module": "log_panics",
  "level": "ERROR",
  "message": "thread 'async-global-executor-1' panicked at 'called `Result::unwrap()` on an
  `Err` value: Os { code: 11, kind: WouldBlock, message: \"Resource temporarily unavailable\" }':
  /tmp/codon/tmp/cache/cargo/registry/src/github.com-1ecc6299db9ec823/
  blocking-1.1.0/src/lib.rs:239
   0: log_panics::init::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:610:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:502:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:139:18
   4: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   5: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
   6: core::result::unwrap_failed
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/result.rs:1690:5
   7: blocking::Executor::grow_pool
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  14: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  15: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  17: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  18: std::thread::local::LocalKey<T>::with
  19: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  20: async_task::raw::RawTask<F,T,S>::run
  21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  23: async_io::driver::block_on
  24: std::thread::local::LocalKey<T>::with
  25: async_global_executor::threading::thread_main_loop
  26: std::sys_common::backtrace::__rust_begin_short_backtrace
  27: core::ops::function::FnOnce::call_once{{vtable.shim}}
  28: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys/unix/thread.rs:108:17
  29: start_thread
  30: clone
"}
{
  "time_ns": 1646202723428453843,
  "time": "2022-03-02T06:32:03.428Z",
  "module": "log_panics",
  "level": "ERROR",
  "message": "thread 'blocking-1130' panicked at 'called `Result::unwrap()` on an
  `Err` value: PoisonError { .. }':
   /tmp/codon/tmp/cache/cargo/registry/src/github.com-1ecc6299db9ec823/
   blocking-1.1.0/src/lib.rs:189
   0: log_panics::init::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:610:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:502:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:139:18
   4: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   5: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
   6: core::result::unwrap_failed
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/result.rs:1690:5
   7: blocking::Executor::main_loop
   8: std::sys_common::backtrace::__rust_begin_short_backtrace
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys/unix/thread.rs:108:17
  11: start_thread
  12: clone
"}
{
  "time_ns": 1646202723430010940,
  "time": "2022-03-02T06:32:03.430Z",
  "module": "log_panics",
  "level": "ERROR",
  "message": "thread 'async-global-executor-1' panicked at 'called `Result::unwrap()` on an
  `Err` value: PoisonError { .. }':
   /tmp/codon/tmp/cache/cargo/registry/src/github.com-1ecc6299db9ec823/
   blocking-1.1.0/src/lib.rs:211
   0: log_panics::init::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:610:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:502:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:139:18
   4: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   5: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
   6: core::result::unwrap_failed
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/result.rs:1690:5
   7: blocking::Executor::schedule
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  14: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  15: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  17: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  18: std::thread::local::LocalKey<T>::with
  19: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  20: async_task::raw::RawTask<F,T,S>::run
  21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  23: async_io::driver::block_on
  24: std::thread::local::LocalKey<T>::with
  25: async_global_executor::threading::thread_main_loop
  26: std::sys_common::backtrace::__rust_begin_short_backtrace
  27: core::ops::function::FnOnce::call_once{{vtable.shim}}
  28: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys/unix/thread.rs:108:17
  29: start_thread
  30: clone
"}

mleonhard avatar Mar 02 '22 07:03 mleonhard

I would accept PR to fix this.

The minimal way is probably to make grow_pool failable and delay unwrap until after the lock is released. (There may be a better way, though.)

taiki-e avatar Mar 05 '22 12:03 taiki-e