zenoh
zenoh copied to clipboard
zenoh-c DLL panics in `libc::atexit` handler on Windows
Describe the bug
See this workflow run failure for context.
The z_api_double_drop_test fails when syncing when syncing with zenoh, starting from commit 0283aaae480d0c0608802a6fbfb79f7fc681469f.
I've observed this crash only when zenoh-c is linked dynamically to an application and not when linked statically. Weirdly enough, this crash still happens when one of (or both of) the z_drop calls are removed.
Unfold this line to see the backtrace of the crash
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439:40
stack backtrace:
0: 0x7ffd8557aee3 - std::backtrace_rs::backtrace::dbghelp::trace
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
1: 0x7ffd8557aee3 - std::backtrace_rs::backtrace::trace_unsynchronized
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
2: 0x7ffd8557aee3 - std::sys_common::backtrace::_print_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
3: 0x7ffd8557aee3 - std::sys_common::backtrace::_print::impl$0::fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
4: 0x7ffd852eca2b - core::fmt::rt::Argument::fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
5: 0x7ffd852eca2b - core::fmt::write
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
6: 0x7ffd85568c80 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
7: 0x7ffd8557d1db - std::sys_common::backtrace::_print
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
8: 0x7ffd8557d1db - std::sys_common::backtrace::print
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
9: 0x7ffd8557cdce - std::panicking::default_hook::closure$1
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
10: 0x7ffd8557dda4 - std::panicking::default_hook
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
11: 0x7ffd8557dda4 - std::panicking::rust_panic_with_hook
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
12: 0x7ffd8557d803 - std::panicking::begin_panic_handler::closure$0
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:595
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:117
17: 0x7ffd85581ce9 - std::alloc::_::__rg_oom
18: 0x7ffd8558e68e - std::alloc::_::__rg_oom
19: 0x7ffd858a17ca - std::alloc::_::__rg_oom
20: 0x7ffd858a2963 - std::alloc::_::__rg_oom
21: 0x7ffd8557a0b2 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
22: 0x7ffd8557a0b2 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
23: 0x7ffd8557a0b2 - core::mem::drop
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\mem\mod.rs:987
24: 0x7ffd8557a0b2 - std::sys::windows::thread::Thread::new
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys\windows\thread.rs:47
25: 0x7ffd858a265a - std::alloc::_::__rg_oom
26: 0x7ffd858a195a - std::alloc::_::__rg_oom
27: 0x7ffd858a163a - std::alloc::_::__rg_oom
28: 0x7ffdd52742d6 - execute_onexit_table
29: 0x7ffdd52741fb - execute_onexit_table
30: 0x7ffdd52741b4 - execute_onexit_table
31: 0x7ffd859d88fd - dllmain_crt_process_detach
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
32: 0x7ffd859d8a22 - dllmain_dispatch
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
33: 0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
34: 0x7ffdd77edcda - LdrShutdownProcess
35: 0x7ffdd77eda8d - RtlExitUserProcess
36: 0x7ffdd611e3bb - FatalExit
37: 0x7ffdd52805bc - exit
38: 0x7ffdd528045f - exit
39: 0x7ff7db3112c7 - <unknown>
40: 0x7ffdd6117344 - BaseThreadInitThunk
The Drop implementation of ZRuntimePool calls .shutdown_timeout() on each runtime in parallel by spawning a thread for each shutdown operation. This Drop implementation is in turn called in a libc::atexit handler.
I'm still not sure why this causes the crash. You can see from the backtrace that a new thread is created (I think this the thread spawned in the Drop implementation?) after zenoh receives a DLL_PROCESS_DETACH notification because the atexit handler of the DLL is called after the application process exits (actually, the DLL has its own atexit handler stack separate from the application). All application threads are signaled at process exit and thus calls to WaitForSingleObject return immediately (this is what Rust uses to implement .join()). The following is the origin of the panic:
# https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1577
impl<'scope, T> JoinInner<'scope, T> {
fn join(mut self) -> Result<T> {
// Calls `WaitForSingleObject` on Windows
self.native.join();
// The first `.unwrap()` is what panics.
// The standard library assumes that:
// "the caller will never read this packet until the thread has exited",
// but that invariant is somehow broken here,
// probably because the thread doesn't exit normally.
Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
}
}
So my theory is that threads spawned after process exit as part of a DLL's atexit handler are somehow signaled at creation and therefore terminate immediately after calling WaitForSingleObject without terminating correctly and dropping their "packet" handles, but I don't have any proof. I think the underlying issue here is much more subtle and needs more digging (but we have a release to push out!). But back to the Drop implementation:
impl Drop for ZRuntimePool {
fn drop(&mut self) {
let t = std::time::Instant::now();
let handles: Vec<_> = self
.0
.drain()
.filter_map(|(_name, mut rt)| {
rt.take()
.map(|r| std::thread::spawn(move || r.shutdown_timeout(Duration::from_secs(1))))
})
.collect();
for hd in handles {
let _ = hd.join();
}
}
}
To reproduce
- Run zenoh-c tests using zenoh commit 0283aaae480d0c0608802a6fbfb79f7fc681469f on Windows.
System info
- Platform: Windows
- Zenoh commit: 0283aaae480d0c0608802a6fbfb79f7fc681469f
After more digging realized that the error is non-deterministic. Sometimes the std::thread::spawn calls in the atexit handler fail with "Access is denied." (and this is what happens when one tries to spawn a thread on Windows in the atexit handler of a DLL, in general). But sometimes the error is much further down in Tokio.
I also realized I wasn't enabling debug symbols in my build and so my backtraces were not helpful at all. The following are backtraces for each scenario. Please note that the code from which I got the backtraces is slightly modified, but is functionally the same.
Backtrace when `Runtime::drop` panics
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439:40
stack backtrace:
0: 0x7ffda9e7c753 - std::backtrace_rs::backtrace::dbghelp::trace
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
1: 0x7ffda9e7c753 - std::backtrace_rs::backtrace::trace_unsynchronized
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
2: 0x7ffda9e7c753 - std::sys_common::backtrace::_print_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
3: 0x7ffda9e7c753 - std::sys_common::backtrace::_print::impl$0::fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
4: 0x7ffda9beed0b - core::fmt::rt::Argument::fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
5: 0x7ffda9beed0b - core::fmt::write
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
6: 0x7ffda9e6ad50 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
7: 0x7ffda9e7ea3b - std::sys_common::backtrace::_print
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
8: 0x7ffda9e7ea3b - std::sys_common::backtrace::print
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
9: 0x7ffda9e7e63e - std::panicking::default_hook::closure$1
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
10: 0x7ffda9e7f594 - std::panicking::default_hook
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
11: 0x7ffda9e7f594 - std::panicking::rust_panic_with_hook
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
12: 0x7ffda9e7eff3 - std::panicking::begin_panic_handler::closure$0
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:595
13: 0x7ffda9e7ef79 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:151
14: 0x7ffda9e7ef64 - std::panicking::begin_panic_handler
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:593
15: 0x7ffdaa2dea85 - core::panicking::panic_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
16: 0x7ffdaa2dec52 - core::panicking::panic
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:117
17: 0x7ffda9e83239 - std::thread::JoinInner<tuple$<> >::join<tuple$<> >
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:1439
18: 0x7ffda9e8f8fe - tokio::runtime::blocking::pool::BlockingPool::shutdown
at C:\Users\zenoh\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.36.0\src\runtime\blocking\pool.rs:270
19: 0x7ffdaa1a181a - tokio::runtime::blocking::pool::impl$4::drop
at C:\Users\zenoh\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.36.0\src\runtime\blocking\pool.rs:278
20: 0x7ffdaa1a181a - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
21: 0x7ffdaa1a181a - core::ptr::drop_in_place<tokio::runtime::runtime::Runtime>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
22: 0x7ffdaa1a2b89 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
23: 0x7ffdaa1a2b89 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
24: 0x7ffdaa1a2b89 - core::mem::maybe_uninit::MaybeUninit<zenoh_runtime::impl$5::drop::closure$1::closure$0::closure_env$0>::assume_init_drop
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\mem\maybe_uninit.rs:728
25: 0x7ffdaa1a2b89 - std::thread::impl$0::spawn_unchecked_::impl$1::drop
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\thread\mod.rs:510
26: 0x7ffdaa1a2b89 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
27: 0x7ffdaa1a2b89 - core::ptr::drop_in_place<std::thread::impl$0::spawn_unchecked_::closure_env$1<zenoh_runtime::impl$5::drop::closure$1::closure$0::closure_env$0,tuple$<>
> >
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
28: 0x7ffda9e7b922 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
29: 0x7ffda9e7b922 - core::ptr::drop_in_place
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\ptr\mod.rs:497
30: 0x7ffda9e7b922 - core::mem::drop
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\mem\mod.rs:987
31: 0x7ffda9e7b922 - std::sys::windows::thread::Thread::new
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys\windows\thread.rs:47
32: 0x7ffdaa1a28b0 - std::panicking::try::do_call
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:500
33: 0x7ffdaa1a28b0 - std::panicking::try
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:464
34: 0x7ffdaa1a28b0 - std::panic::catch_unwind
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panic.rs:142
35: 0x7ffdaa1a28b0 - zenoh_runtime::impl$5::drop::closure$1
at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:202
36: 0x7ffdaa1a28b0 - core::ops::function::impls::impl$4::call_once
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ops\function.rs:305
37: 0x7ffdaa1a28b0 - enum2$<core::option::Option<tuple$<zenoh_runtime::ZRuntime,enum2$<core::option::Option<tokio::runtime::runtime::Runtime> > > > >::map
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\option.rs:1075
38: 0x7ffdaa1a28b0 - core::iter::adapters::map::impl$2::next<enum2$<core::result::Result<std::thread::JoinHandle<tuple$<> >,alloc::boxed::Box<dyn$<core::any::Any,core::mark
er::Send>,alloc::alloc::Global> > >,core::iter::adapters::take::Take<core::iter::adapters::filter_map::F
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\iter\adapters\map.rs:103
39: 0x7ffdaa1a19f6 - core::ptr::drop_in_place<zenoh_runtime::ZRuntimePool>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
40: 0x7ffdaa1a168a - zenoh_runtime::cleanup
at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:152
41: 0x7ffdd52742d6 - execute_onexit_table
42: 0x7ffdd52741fb - execute_onexit_table
43: 0x7ffdd52741b4 - execute_onexit_table
44: 0x7ffdaa2d8aad - dllmain_crt_process_detach
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
45: 0x7ffdaa2d8bd2 - dllmain_dispatch
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
46: 0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
47: 0x7ffdd77edcda - LdrShutdownProcess
48: 0x7ffdd77eda8d - RtlExitUserProcess
49: 0x7ffdd611e3bb - FatalExit
50: 0x7ffdd52805bc - exit
51: 0x7ffdd528045f - exit
52: 0x7ff656f212c7 - <unknown>
53: 0x7ffdd6117344 - BaseThreadInitThunk
54: 0x7ffdd77e26b1 - RtlUserThreadStart
Backtrace when `Runtime::drop` doesn't panic
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: PermissionDenied, message: "Access is denied." }',
C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:208:24
stack backtrace:
0: 0x7ffda9e7c753 - std::backtrace_rs::backtrace::dbghelp::trace
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
1: 0x7ffda9e7c753 - std::backtrace_rs::backtrace::trace_unsynchronized
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
2: 0x7ffda9e7c753 - std::sys_common::backtrace::_print_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:65
3: 0x7ffda9e7c753 - std::sys_common::backtrace::_print::impl$0::fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:44
4: 0x7ffda9beed0b - core::fmt::rt::Argument::fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\rt.rs:138
5: 0x7ffda9beed0b - core::fmt::write
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\fmt\mod.rs:1094
6: 0x7ffda9e6ad50 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\io\mod.rs:1714
7: 0x7ffda9e7ea3b - std::sys_common::backtrace::_print
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:47
8: 0x7ffda9e7ea3b - std::sys_common::backtrace::print
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:34
9: 0x7ffda9e7e63e - std::panicking::default_hook::closure$1
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:269
10: 0x7ffda9e7f594 - std::panicking::default_hook
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:288
11: 0x7ffda9e7f594 - std::panicking::rust_panic_with_hook
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:705
12: 0x7ffda9e7f025 - std::panicking::begin_panic_handler::closure$0
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:597
13: 0x7ffda9e7ef79 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\sys_common\backtrace.rs:151
14: 0x7ffda9e7ef64 - std::panicking::begin_panic_handler
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\std\src\panicking.rs:593
15: 0x7ffdaa2dea85 - core::panicking::panic_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\panicking.rs:67
16: 0x7ffdaa2defa3 - core::result::unwrap_failed
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library\core\src\result.rs:1651
17: 0x7ffdaa1a29a5 - std::panicking::try::do_call
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:500
18: 0x7ffdaa1a29a5 - std::panicking::try
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panicking.rs:464
19: 0x7ffdaa1a29a5 - std::panic::catch_unwind
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\std\src\panic.rs:142
20: 0x7ffdaa1a29a5 - zenoh_runtime::impl$5::drop::closure$1
at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:202
21: 0x7ffdaa1a29a5 - core::ops::function::impls::impl$4::call_once
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ops\function.rs:305
22: 0x7ffdaa1a29a5 - enum2$<core::option::Option<tuple$<zenoh_runtime::ZRuntime,enum2$<core::option::Option<tokio::runtime::runtime::Runtime> > > > >::map
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\option.rs:1075
23: 0x7ffdaa1a29a5 - core::iter::adapters::map::impl$2::next<enum2$<core::result::Result<std::thread::JoinHandle<tuple$<> >,alloc::boxed::Box<dyn$<core::any::Any,core::mark
er::Send>,alloc::alloc::Global> > >,core::iter::adapters::take::Take<core::iter::adapters::filter_map::F
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\iter\adapters\map.rs:104
24: 0x7ffdaa1a19f6 - core::ptr::drop_in_place<zenoh_runtime::ZRuntimePool>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be\library\core\src\ptr\mod.rs:497
25: 0x7ffdaa1a168a - zenoh_runtime::cleanup
at C:\Users\zenoh\tmp\zenoh\commons\zenoh-runtime\src\lib.rs:152
26: 0x7ffdd52742d6 - execute_onexit_table
27: 0x7ffdd52741fb - execute_onexit_table
28: 0x7ffdd52741b4 - execute_onexit_table
29: 0x7ffdaa2d8aad - dllmain_crt_process_detach
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:180
30: 0x7ffdaa2d8bd2 - dllmain_dispatch
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp:293
31: 0x7ffdd77a9a1d - RtlActivateActivationContextUnsafeFast
32: 0x7ffdd77edcda - LdrShutdownProcess
33: 0x7ffdd77eda8d - RtlExitUserProcess
34: 0x7ffdd611e3bb - FatalExit
35: 0x7ffdd52805bc - exit
36: 0x7ffdd528045f - exit
37: 0x7ff656f212c7 - <unknown>
38: 0x7ffdd6117344 - BaseThreadInitThunk
39: 0x7ffdd77e26b1 - RtlUserThreadStart
To understand what's going on here. Let's start with the Windows thread spawning function of the Rust stdlib:
# https://github.com/rust-lang/rust/blob/1.72.0/library/std/src/sys/windows/thread.rs#L33
let ret = c::CreateThread(
ptr::null_mut(),
stack,
Some(thread_start),
p as *mut _,
c::STACK_SIZE_PARAM_IS_A_RESERVATION,
ptr::null_mut(),
);
let ret = HandleOrNull::from_raw_handle(ret);
return if let Ok(handle) = ret.try_into() {
Ok(Thread { handle: Handle::from_inner(handle) })
} else {
// The thread failed to start and as a result p was not consumed. Therefore, it is
// safe to reconstruct the box so that it gets deallocated.
drop(Box::from_raw(p));
Err(io::Error::last_os_error())
};
Thus, if a thread creation syscall fails, Rust will try to drop the thread closure before returning the error. In our case, the Drop implementation is in Tokio's tokio::runtime::blocking::BlockingPool which will .join() all threads of the runtime:
# https://github.com/tokio-rs/tokio/blob/tokio-1.35.x/tokio/src/runtime/blocking/pool.rs#L269
for (_id, handle) in workers {
let _ = handle.join();
}
So why do we sometimes reach this point in Runtime::drop and why does this make the .join() call panic? The answer lies again in the Rust stdlib thread implementation:
# https://github.com/rust-lang/rust/blob/1.72.0/library/std/src/thread/mod.rs#L528
let try_result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
crate::sys_common::backtrace::__rust_begin_short_backtrace(f)
}));
// SAFETY: `their_packet` as been built just above and moved by the
// closure (it is an Arc<...>) and `my_packet` will be stored in the
// same `JoinInner` as this closure meaning the mutation will be
// safe (not modify it and affect a value far away).
unsafe { *their_packet.result.get() = Some(try_result) };
// Here `their_packet` gets dropped, and if this is the last `Arc` for that packet that
// will call `decrement_num_running_threads` and therefore signal that this thread is
// done.
drop(their_packet);
In the above snippet, f is the closure of the spawned thread. The Packet object is a means to transfer the result of the spawned thread back to the current thread. Thus the .join() implementation will first call WaitForSingleObject on Windows (i.e self.native.join()) and then assume that the thread finished execution and dropped its packet:
# https://github.com/rust-lang/rust/blob/master/library/std/src/thread/mod.rs#L1577
impl<'scope, T> JoinInner<'scope, T> {
fn join(mut self) -> Result<T> {
// Calls `WaitForSingleObject` on Windows
self.native.join();
Arc::get_mut(&mut self.packet).unwrap().result.get_mut().take().unwrap()
}
}
Except that when the zenoh-c application (not the DLL) exits, Windows would've already signaled all the Tokio runtime threads by the time we reach the atexit handler. So there is a race condition where sometimes the thread will stop exection before dropping its packet (or setting the result value for that matter).
If a runtime thread ends up dropping its packet, then the .join() call on its handle will succeed, thus the std::thread::spawn call will correctly return the Windows "Access Denied" error. Otherwise, the .join() call will panic, misleading us about the origin of the error.
I opened https://github.com/rust-lang/rust/issues/124466 and https://github.com/rust-lang/rust/issues/124468 to discuss/improve the stdlib's handling of this.