panic during shutdown with RUST_LOG=trace
Describe the bug
With "trace" log-level, zenoh panics with the following callstack:
stderr: thread 'main' panicked at xplat/rust/toolchain/sysroot/1.86.0/library/std/src/thread/local.rs:276:25:
stderr: cannot access a Thread Local Storage value during or after destruction: AccessError
stderr: stack backtrace:
stderr: 0: 0x7f94abd4e48e - <<std[94ff872207e7c6e]::sys::backtrace::BacktraceLock>::print::DisplayBacktrace as core[3a29e1362dffb4e6]::fmt::Display>::fmt
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
stderr: 1: 0x7f94abd9bdc3 - core[3a29e1362dffb4e6]::fmt::write
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/fmt/rt.rs:177:76
stderr: 2: 0x7f94abd46fe9 - <std[94ff872207e7c6e]::sys::pal::unix::stdio::Stderr as std[94ff872207e7c6e]::io::Write>::write_fmt
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/io/mod.rs:1890:15
stderr: 3: 0x7f94abd4e333 - <std[94ff872207e7c6e]::sys::backtrace::BacktraceLock>::print
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/sys/backtrace.rs:42:9
stderr: 4: 0x7f94abd93aed - std[94ff872207e7c6e]::panicking::default_hook::{closure#0}
stderr: 5: 0x7f94abd93951 - std[94ff872207e7c6e]::panicking::default_hook
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:325:9
stderr: 6: 0x7f94abd94177 - std[94ff872207e7c6e]::panicking::rust_panic_with_hook
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:831:13
stderr: 7: 0x7f94abd4eb4e - std[94ff872207e7c6e]::panicking::begin_panic_handler::{closure#0}
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:704:13
stderr: 8: 0x7f94abd4e7e9 - std[94ff872207e7c6e]::sys::backtrace::__rust_end_short_backtrace::<std[94ff872207e7c6e]::panicking::begin_panic_handler::{closure#0}, !>
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/sys/backtrace.rs:168:18
stderr: 9: 0x7f94abd93baf - rust_begin_unwind
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:695:5
stderr: 10: 0x7f94abdb9e10 - core[3a29e1362dffb4e6]::panicking::panic_fmt
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/panicking.rs:75:14
stderr: 11: 0x7f94abd2a523 - std[94ff872207e7c6e]::thread::local::panic_access_error
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/thread/local.rs:232:5
stderr: 12: 0x555b3e1fd956 - <std[94ff872207e7c6e]::thread::local::LocalKey<core[3a29e1362dffb4e6]::cell::RefCell<alloc[2f450b0663b6a2ca]::string::String>>>::with::<<tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr> as tracing_subscriber[bef8f40d1232f893]::layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry>>::on_event::{closure#0}, ()>
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/thread/local.rs:276:25
stderr: 13: 0x555b3e1f61f8 - <tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr> as tracing_subscriber[bef8f40d1232f893]::layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry>>::on_event
stderr: at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/fmt/fmt_layer.rs:949:9
stderr: 14: 0x555b3e1abca4 - <tracing_subscriber[bef8f40d1232f893]::layer::layered::Layered<tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr>, tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry> as tracing_core[48e5f9afc8996d3f]::subscriber::Subscriber>::event
stderr: at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/layer/layered.rs:153:9
stderr: 15: 0x555b3e1ac23c - <tracing_subscriber[bef8f40d1232f893]::layer::layered::Layered<tracing_subscriber[bef8f40d1232f893]::filter::env::EnvFilter, tracing_subscriber[bef8f40d1232f893]::layer::layered::Layered<tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr>, tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry>> as tracing_core[48e5f9afc8996d3f]::subscriber::Subscriber>::event
stderr: at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/layer/layered.rs:152:9
stderr: 16: 0x555b3e1ba429 - <tracing_subscriber[bef8f40d1232f893]::fmt::Subscriber<tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, tracing_subscriber[bef8f40d1232f893]::filter::env::EnvFilter, std[94ff872207e7c6e]::io::stdio::stderr> as tracing_core[48e5f9afc8996d3f]::subscriber::Subscriber>::event
stderr: at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/fmt/mod.rs:408:9
stderr: 17: 0x7f94a91e4ac2 - <tracing_core[48e5f9afc8996d3f]::dispatcher::Dispatch>::event
stderr: at ./third-party/rust/vendor/tracing-core-0.1.33/src/dispatcher.rs:613:13
stderr: 18: 0x7f94a91e158b - <tracing_core[48e5f9afc8996d3f]::event::Event>::dispatch::{closure#0}
stderr: at ./third-party/rust/vendor/tracing-core-0.1.33/src/event.rs:35:13
stderr: 19: 0x7f94a91e41c8 - tracing_core[48e5f9afc8996d3f]::dispatcher::get_default::<(), <tracing_core[48e5f9afc8996d3f]::event::Event>::dispatch::{closure#0}>
stderr: at ./third-party/rust/vendor/tracing-core-0.1.33/src/dispatcher.rs:389:16
stderr: 20: 0x7f94a91e1561 - <tracing_core[48e5f9afc8996d3f]::event::Event>::dispatch
stderr: at ./third-party/rust/vendor/tracing-core-0.1.33/src/event.rs:34:9
stderr: 21: 0x7f94a1344021 - <zenoh_shm[837e8891168f6622]::shm::unix::SegmentImpl<u64>>::is_dangling_segment::{closure#0}
stderr: at ./third-party/rust/vendor/tracing-0.1.41/src/macros.rs:875:17
stderr: 22: 0x7f94a13406cd - <zenoh_shm[837e8891168f6622]::shm::unix::SegmentImpl<u64>>::is_dangling_segment
stderr: at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/shm/unix.rs:241:13
stderr: 23: 0x7f94a1340191 - <zenoh_shm[837e8891168f6622]::shm::unix::SegmentImpl<u64>>::ensure_not_persistent
stderr: at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/shm/unix.rs:190:12
stderr: 24: 0x7f94a131b7c9 - <zenoh_shm[837e8891168f6622]::shm::Segment<u64>>::ensure_not_persistent
stderr: at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/shm/mod.rs:69:9
stderr: 25: 0x7f94a1360330 - zenoh_shm[837e8891168f6622]::posix_shm::cleanup::platform::cleanup_orphaned_segments_inner
stderr: at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/posix_shm/cleanup.rs:54:21
stderr: 26: 0x7f94a135fb3e - zenoh_shm[837e8891168f6622]::posix_shm::cleanup::platform::cleanup_orphaned_segments
stderr: at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/posix_shm/cleanup.rs:31:25
stderr: 27: 0x7f94a131fcd1 - <zenoh_shm[837e8891168f6622]::cleanup::Cleanup as core[3a29e1362dffb4e6]::ops::drop::Drop>::drop
stderr: at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/cleanup.rs:66:9
stderr: 28: 0x7f94a136de25 - core[3a29e1362dffb4e6]::ptr::drop_in_place::<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/ptr/mod.rs:523:1
stderr: 29: 0x7f94a1371126 - <<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup> as static_init[d36ae8801d3dde6f]::Finaly>::finaly as core[3a29e1362dffb4e6]::ops::function::FnOnce<(&static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>,)>>::call_once
stderr: at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/ptr/mut_ptr.rs:1492:18
stderr: 30: 0x7f94a13672e0 - static_init[d36ae8801d3dde6f]::lazy_sequentializer::lazy_finalization::<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>, static_init[d36ae8801d3dde6f]::phase_locker::sync::SyncPhaseGuard<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>>, <static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup> as static_init[d36ae8801d3dde6f]::Finaly>::finaly>
stderr: at ./third-party/rust/vendor/static_init-1.0.3/src/phase_locker/sync.rs:166:19
stderr: 31: 0x7f94a1380c60 - <static_init[d36ae8801d3dde6f]::generic_lazy::GenericLockedLazySeq<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>, static_init[d36ae8801d3dde6f]::exit_sequentializer::exit_manager::ExitSequentializer<zenoh_shm[837e8891168f6622]::cleanup::__StaticInitGeneratorFor_CLEANUP>> as static_init[d36ae8801d3dde6f]::exit_sequentializer::exit_manager::OnExit>::execute
stderr: at ./third-party/rust/vendor/static_init-1.0.3/src/lazy_sequentializer.rs:568:13
stderr: 32: 0x7f94a1d5f2fe - static_init[d36ae8801d3dde6f]::exit_sequentializer::exit_manager::reg::execute_at_exit
stderr: at ./third-party/rust/vendor/static_init-1.0.3/src/exit_sequentializer.rs:68:17
stderr: 33: 0x7f94ac279c73 - _dl_fini
stderr: at ./third-party2/glibc/2.34/src/glibc-2.34/elf/dl-fini.c:138:9
stderr: 34: 0x7f94abe478b8 - __run_exit_handlers
stderr: at ./third-party2/glibc/2.34/src/glibc-2.34/stdlib/exit.c:113:8
stderr: 35: 0x7f94abe479ca - __GI_exit
stderr: at ./third-party2/glibc/2.34/src/glibc-2.34/stdlib/exit.c:143:3
stderr: 36: 0x7f94abe2c65e - __libc_start_call_main
stderr: at ./third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:74:3
stderr: 37: 0x7f94abe2c718 - __libc_start_main_impl
stderr: at ./third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
stderr: 38: 0x555b3e1027a1 - _start
stderr: at ./third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116:0
stderr: 39: 0x0 - <unknown>
stderr: fatal runtime error: failed to initiate panic, error 5
To reproduce
- Run zenohd
- RUST_LOG=trace
System info
- Platform: linux centos
This is probably due logging during CLEANUP without threadlocal environment
This is probably due logging during CLEANUP without threadlocal environment
Hi @alokpr, we have been aware of this thread local storage issue and don't suggest any shutdown during this stage. Unfortunately, I can't reproduce your problem using the steps below.
export RUST_LOG="trace"
# Use rust-toolchain 1.86.0
cargo run --bin zenohd
# Press Ctrl-C to terminate
I noticed that you have the shared memory feature enabled in the log. Can you show us the detailed steps to reproduce this issue?
What do you mean by you "don't suggest any shutdown"? The panic happens automatically during process shutdown due to the usage of static_init::dynamic::drop here - https://github.com/eclipse-zenoh/zenoh/blob/037c5466d9bbfb22dfc019bb491c3fe10237a2e0/commons/zenoh-shm/src/cleanup.rs#L24
I will get you repro steps.
What do you mean by you "don't suggest any shutdown"? The panic happens automatically during process shutdown due to the usage of static_init::dynamic::drop here -
zenoh/commons/zenoh-shm/src/cleanup.rs
Line 24 in 037c546 #[dynamic(lazy, drop)]
I will get you repro steps.
Hi @alokpr! I mean we don't suggest any shutdown of the async runtime during the exit stage or any similarly causing TLS issue. And I understand that this time is probably caused by our shared memory cleanup. We are happy to resolve this if you can provide some steps to reproduce it.
Hi @alokpr, we've updated the logging procedure. Could you try this #1907?
@YuanYuYuan Thanks for the quick response. It is a bit hard to test unreleased crates in our dev setup right now. I will see if I can repro the issue outside our project and test #1907
Hello @alokpr, could you please confirm if it was fixed?