zenoh icon indicating copy to clipboard operation
zenoh copied to clipboard

zenoh-c DLL panics in `libc::atexit` handler on Windows

Open fuzzypixelz opened this issue 1 year ago • 5 comments

Describe the bug

See this workflow run failure for context.

The z_api_double_drop_test fails when syncing when syncing with zenoh, starting from commit 0283aaae480d0c0608802a6fbfb79f7fc681469f.

I've observed this crash only when zenoh-c is linked dynamically to an application and not when linked statically. Weirdly enough, this crash still happens when one of (or both of) the z_drop calls are removed.

Unfold this line to see the backtrace of the crash
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439:40
stack backtrace:
   0:     0x7ffd8557aee3 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ffd8557aee3 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ffd8557aee3 - std::sys_common::backtrace::_print_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
   3:     0x7ffd8557aee3 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
   4:     0x7ffd852eca2b - core::fmt::rt::Argument::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
   5:     0x7ffd852eca2b - core::fmt::write
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
   6:     0x7ffd85568c80 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
   7:     0x7ffd8557d1db - std::sys_common::backtrace::_print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
   8:     0x7ffd8557d1db - std::sys_common::backtrace::print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
   9:     0x7ffd8557cdce - std::panicking::default_hook::closure$1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
  10:     0x7ffd8557dda4 - std::panicking::default_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
  11:     0x7ffd8557dda4 - std::panicking::rust_panic_with_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
  12:     0x7ffd8557d803 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:595
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:117
  17:     0x7ffd85581ce9 - std::alloc::_::__rg_oom
  18:     0x7ffd8558e68e - std::alloc::_::__rg_oom
  19:     0x7ffd858a17ca - std::alloc::_::__rg_oom
  20:     0x7ffd858a2963 - std::alloc::_::__rg_oom
  21:     0x7ffd8557a0b2 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  22:     0x7ffd8557a0b2 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  23:     0x7ffd8557a0b2 - core::mem::drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\mem\mod.rs:987
  24:     0x7ffd8557a0b2 - std::sys::windows::thread::Thread::new
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys\windows\thread.rs:47
  25:     0x7ffd858a265a - std::alloc::_::__rg_oom
  26:     0x7ffd858a195a - std::alloc::_::__rg_oom
  27:     0x7ffd858a163a - std::alloc::_::__rg_oom
  28:     0x7ffdd52742d6 - execute_onexit_table
  29:     0x7ffdd52741fb - execute_onexit_table
  30:     0x7ffdd52741b4 - execute_onexit_table
  31:     0x7ffd859d88fd - dllmain_crt_process_detach
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  32:     0x7ffd859d8a22 - dllmain_dispatch
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  33:     0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
  34:     0x7ffdd77edcda - LdrShutdownProcess
  35:     0x7ffdd77eda8d - RtlExitUserProcess
  36:     0x7ffdd611e3bb - FatalExit
  37:     0x7ffdd52805bc - exit
  38:     0x7ffdd528045f - exit
  39:     0x7ff7db3112c7 - <unknown>
  40:     0x7ffdd6117344 - BaseThreadInitThunk

The Drop implementation of ZRuntimePool calls .shutdown_timeout() on each runtime in parallel by spawning a thread for each shutdown operation. This Drop implementation is in turn called in a libc::atexit handler.

I'm still not sure why this causes the crash. You can see from the backtrace that a new thread is created (I think this the thread spawned in the Drop implementation?) after zenoh receives a DLL_PROCESS_DETACH notification because the atexit handler of the DLL is called after the application process exits (actually, the DLL has its own atexit handler stack separate from the application). All application threads are signaled at process exit and thus calls to WaitForSingleObject return immediately (this is what Rust uses to implement .join()). The following is the origin of the panic:

# https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1577
impl<'scope, T> JoinInner<'scope, T> {
    fn join(mut self) -> Result<T> {
        // Calls `WaitForSingleObject` on Windows
        self.native.join();
        // The first `.unwrap()` is what panics. 
        // The standard library assumes that:
        // "the caller will never read this packet until the thread has exited",
        // but that invariant is somehow broken here,
        // probably because the thread doesn't exit normally.
        Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
    }
}

So my theory is that threads spawned after process exit as part of a DLL's atexit handler are somehow signaled at creation and therefore terminate immediately after calling WaitForSingleObject without terminating correctly and dropping their "packet" handles, but I don't have any proof. I think the underlying issue here is much more subtle and needs more digging (but we have a release to push out!). But back to the Drop implementation:

impl Drop for ZRuntimePool {
    fn drop(&mut self) {
        let t = std::time::Instant::now();
        let handles: Vec<_> = self
            .0
            .drain()
            .filter_map(|(_name, mut rt)| {
                rt.take()
                    .map(|r| std::thread::spawn(move || r.shutdown_timeout(Duration::from_secs(1))))
            })
            .collect();

        for hd in handles {
            let _ = hd.join();
        }
    }
}

To reproduce

  1. Run zenoh-c tests using zenoh commit 0283aaae480d0c0608802a6fbfb79f7fc681469f on Windows.

System info

  • Platform: Windows
  • Zenoh commit: 0283aaae480d0c0608802a6fbfb79f7fc681469f

fuzzypixelz avatar Apr 25 '24 09:04 fuzzypixelz

After more digging realized that the error is non-deterministic. Sometimes the std::thread::spawn calls in the atexit handler fail with "Access is denied." (and this is what happens when one tries to spawn a thread on Windows in the atexit handler of a DLL, in general). But sometimes the error is much further down in Tokio.

I also realized I wasn't enabling debug symbols in my build and so my backtraces were not helpful at all. The following are backtraces for each scenario. Please note that the code from which I got the backtraces is slightly modified, but is functionally the same.

Backtrace when `Runtime::drop` panics
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439:40
stack backtrace:
   0:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ffda9e7c753 - std::sys_common::backtrace::_print_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
   3:     0x7ffda9e7c753 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
   4:     0x7ffda9beed0b - core::fmt::rt::Argument::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
   5:     0x7ffda9beed0b - core::fmt::write
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
   6:     0x7ffda9e6ad50 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
   7:     0x7ffda9e7ea3b - std::sys_common::backtrace::_print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
   8:     0x7ffda9e7ea3b - std::sys_common::backtrace::print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
   9:     0x7ffda9e7e63e - std::panicking::default_hook::closure$1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
  10:     0x7ffda9e7f594 - std::panicking::default_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
  11:     0x7ffda9e7f594 - std::panicking::rust_panic_with_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
  12:     0x7ffda9e7eff3 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:595
  13:     0x7ffda9e7ef79 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:151
  14:     0x7ffda9e7ef64 - std::panicking::begin_panic_handler
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:593
  15:     0x7ffdaa2dea85 - core::panicking::panic_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
  16:     0x7ffdaa2dec52 - core::panicking::panic
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:117
  17:     0x7ffda9e83239 - std::thread::JoinInner<tuple$<> >::join<tuple$<> >
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439
  18:     0x7ffda9e8f8fe - tokio::runtime::blocking::pool::BlockingPool::shutdown
                               at C:\Users\zenoh\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.36.0\src\runtime\blocking\pool.rs:270
  19:     0x7ffdaa1a181a - tokio::runtime::blocking::pool::impl$4::drop
                               at C:\Users\zenoh\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.36.0\src\runtime\blocking\pool.rs:278
  20:     0x7ffdaa1a181a - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  21:     0x7ffdaa1a181a - core::ptr::drop_in_place<tokio::runtime::runtime::Runtime>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  22:     0x7ffdaa1a2b89 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  23:     0x7ffdaa1a2b89 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  24:     0x7ffdaa1a2b89 - core::mem::maybe_uninit::MaybeUninit<zenoh_runtime::impl$5::drop::closure$1::closure$0::closure_env$0>::assume_init_drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\mem\maybe_uninit.rs:728
  25:     0x7ffdaa1a2b89 - std::thread::impl$0::spawn_unchecked_::impl$1::drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:510
  26:     0x7ffdaa1a2b89 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  27:     0x7ffdaa1a2b89 - core::ptr::drop_in_place<std::thread::impl$0::spawn_unchecked_::closure_env$1<zenoh_runtime::impl$5::drop::closure$1::closure$0::closure_env$0,tuple$<>
 > >
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  28:     0x7ffda9e7b922 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  29:     0x7ffda9e7b922 - core::ptr::drop_in_place
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
  30:     0x7ffda9e7b922 - core::mem::drop
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\mem\mod.rs:987
  31:     0x7ffda9e7b922 - std::sys::windows::thread::Thread::new
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys\windows\thread.rs:47
  32:     0x7ffdaa1a28b0 - std::panicking::try::do_call
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:500
  33:     0x7ffdaa1a28b0 - std::panicking::try
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:464
  34:     0x7ffdaa1a28b0 - std::panic::catch_unwind
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panic.rs:142
  35:     0x7ffdaa1a28b0 - zenoh_runtime::impl$5::drop::closure$1
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:202
  36:     0x7ffdaa1a28b0 - core::ops::function::impls::impl$4::call_once
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ops\function.rs:305
  37:     0x7ffdaa1a28b0 - enum2$<core::option::Option<tuple$<zenoh_runtime::ZRuntime,enum2$<core::option::Option<tokio::runtime::runtime::Runtime> > > > >::map
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\option.rs:1075
  38:     0x7ffdaa1a28b0 - core::iter::adapters::map::impl$2::next<enum2$<core::result::Result<std::thread::JoinHandle<tuple$<> >,alloc::boxed::Box<dyn$<core::any::Any,core::mark
er::Send>,alloc::alloc::Global> > >,core::iter::adapters::take::Take<core::iter::adapters::filter_map::F
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\iter\adapters\map.rs:103
  39:     0x7ffdaa1a19f6 - core::ptr::drop_in_place<zenoh_runtime::ZRuntimePool>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  40:     0x7ffdaa1a168a - zenoh_runtime::cleanup
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:152
  41:     0x7ffdd52742d6 - execute_onexit_table
  42:     0x7ffdd52741fb - execute_onexit_table
  43:     0x7ffdd52741b4 - execute_onexit_table
  44:     0x7ffdaa2d8aad - dllmain_crt_process_detach
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  45:     0x7ffdaa2d8bd2 - dllmain_dispatch
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  46:     0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
  47:     0x7ffdd77edcda - LdrShutdownProcess
  48:     0x7ffdd77eda8d - RtlExitUserProcess
  49:     0x7ffdd611e3bb - FatalExit
  50:     0x7ffdd52805bc - exit
  51:     0x7ffdd528045f - exit
  52:     0x7ff656f212c7 - <unknown>
  53:     0x7ffdd6117344 - BaseThreadInitThunk
  54:     0x7ffdd77e26b1 - RtlUserThreadStart
Backtrace when `Runtime::drop` doesn't panic
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: PermissionDenied, message: "Access is denied." }',
 C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:208:24
stack backtrace:
   0:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ffda9e7c753 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ffda9e7c753 - std::sys_common::backtrace::_print_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
   3:     0x7ffda9e7c753 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
   4:     0x7ffda9beed0b - core::fmt::rt::Argument::fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
   5:     0x7ffda9beed0b - core::fmt::write
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
   6:     0x7ffda9e6ad50 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
   7:     0x7ffda9e7ea3b - std::sys_common::backtrace::_print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
   8:     0x7ffda9e7ea3b - std::sys_common::backtrace::print
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
   9:     0x7ffda9e7e63e - std::panicking::default_hook::closure$1
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
  10:     0x7ffda9e7f594 - std::panicking::default_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
  11:     0x7ffda9e7f594 - std::panicking::rust_panic_with_hook
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
  12:     0x7ffda9e7f025 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:597
  13:     0x7ffda9e7ef79 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:151
  14:     0x7ffda9e7ef64 - std::panicking::begin_panic_handler
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:593
  15:     0x7ffdaa2dea85 - core::panicking::panic_fmt
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
  16:     0x7ffdaa2defa3 - core::result::unwrap_failed
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\result.rs:1651
  17:     0x7ffdaa1a29a5 - std::panicking::try::do_call
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:500
  18:     0x7ffdaa1a29a5 - std::panicking::try
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:464
  19:     0x7ffdaa1a29a5 - std::panic::catch_unwind
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panic.rs:142
  20:     0x7ffdaa1a29a5 - zenoh_runtime::impl$5::drop::closure$1
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:202
  21:     0x7ffdaa1a29a5 - core::ops::function::impls::impl$4::call_once
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ops\function.rs:305
  22:     0x7ffdaa1a29a5 - enum2$<core::option::Option<tuple$<zenoh_runtime::ZRuntime,enum2$<core::option::Option<tokio::runtime::runtime::Runtime> > > > >::map
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\option.rs:1075
  23:     0x7ffdaa1a29a5 - core::iter::adapters::map::impl$2::next<enum2$<core::result::Result<std::thread::JoinHandle<tuple$<> >,alloc::boxed::Box<dyn$<core::any::Any,core::mark
er::Send>,alloc::alloc::Global> > >,core::iter::adapters::take::Take<core::iter::adapters::filter_map::F
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\iter\adapters\map.rs:104
  24:     0x7ffdaa1a19f6 - core::ptr::drop_in_place<zenoh_runtime::ZRuntimePool>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
  25:     0x7ffdaa1a168a - zenoh_runtime::cleanup
                               at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:152
  26:     0x7ffdd52742d6 - execute_onexit_table
  27:     0x7ffdd52741fb - execute_onexit_table
  28:     0x7ffdd52741b4 - execute_onexit_table
  29:     0x7ffdaa2d8aad - dllmain_crt_process_detach
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
  30:     0x7ffdaa2d8bd2 - dllmain_dispatch
                               at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
  31:     0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
  32:     0x7ffdd77edcda - LdrShutdownProcess
  33:     0x7ffdd77eda8d - RtlExitUserProcess
  34:     0x7ffdd611e3bb - FatalExit
  35:     0x7ffdd52805bc - exit
  36:     0x7ffdd528045f - exit
  37:     0x7ff656f212c7 - <unknown>
  38:     0x7ffdd6117344 - BaseThreadInitThunk
  39:     0x7ffdd77e26b1 - RtlUserThreadStart

To understand what's going on here. Let's start with the Windows thread spawning function of the Rust stdlib:

# https://github.com/rust-lang/rust/blob/1.72.0/library/std/src/sys/windows/thread.rs#L33
let ret = c::CreateThread(
    ptr::null_mut(),
    stack,
    Some(thread_start),
    p as *mut _,
    c::STACK_SIZE_PARAM_IS_A_RESERVATION,
    ptr::null_mut(),
);
let ret = HandleOrNull::from_raw_handle(ret);
return if let Ok(handle) = ret.try_into() {
    Ok(Thread { handle: Handle::from_inner(handle) })
} else {
    // The thread failed to start and as a result p was not consumed. Therefore, it is
    // safe to reconstruct the box so that it gets deallocated.
    drop(Box::from_raw(p));
    Err(io::Error::last_os_error())
};

Thus, if a thread creation syscall fails, Rust will try to drop the thread closure before returning the error. In our case, the Drop implementation is in Tokio's tokio::runtime::blocking::BlockingPool which will .join() all threads of the runtime:

# https://github.com/tokio-rs/tokio/blob/tokio-1.35.x/tokio/src/runtime/blocking/pool.rs#L269
for (_id, handle) in workers {
    let _ = handle.join();
}

So why do we sometimes reach this point in Runtime::drop and why does this make the .join() call panic? The answer lies again in the Rust stdlib thread implementation:

# https://github.com/rust-lang/rust/blob/1.72.0/library/std/src/thread/mod.rs#L528
let try_result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
    crate::sys_common::backtrace::__rust_begin_short_backtrace(f)
}));
// SAFETY: `their_packet` as been built just above and moved by the
// closure (it is an Arc<...>) and `my_packet` will be stored in the
// same `JoinInner` as this closure meaning the mutation will be
// safe (not modify it and affect a value far away).
unsafe { *their_packet.result.get() = Some(try_result) };
// Here `their_packet` gets dropped, and if this is the last `Arc` for that packet that
// will call `decrement_num_running_threads` and therefore signal that this thread is
// done.
drop(their_packet);

In the above snippet, f is the closure of the spawned thread. The Packet object is a means to transfer the result of the spawned thread back to the current thread. Thus the .join() implementation will first call WaitForSingleObject on Windows (i.e self.native.join()) and then assume that the thread finished execution and dropped its packet:

# https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1577
impl<'scope, T> JoinInner<'scope, T> {
    fn join(mut self) -> Result<T> {
        // Calls `WaitForSingleObject` on Windows
        self.native.join();
        Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
    }
}

Except that when the zenoh-c application (not the DLL) exits, Windows would've already signaled all the Tokio runtime threads by the time we reach the atexit handler. So there is a race condition where sometimes the thread will stop exection before dropping its packet (or setting the result value for that matter).

If a runtime thread ends up dropping its packet, then the .join() call on its handle will succeed, thus the std::thread::spawn call will correctly return the Windows "Access Denied" error. Otherwise, the .join() call will panic, misleading us about the origin of the error.

fuzzypixelz avatar Apr 26 '24 11:04 fuzzypixelz

I opened https://github.com/rust-lang/rust/issues/124466 and https://github.com/rust-lang/rust/issues/124468 to discuss/improve the stdlib's handling of this.

fuzzypixelz avatar Apr 29 '24 06:04 fuzzypixelz