zenoh icon indicating copy to clipboard operation
zenoh copied to clipboard

panic during shutdown with RUST_LOG=trace

Open alokpr opened this issue 8 months ago • 6 comments

Describe the bug

With "trace" log-level, zenoh panics with the following callstack:

stderr: thread 'main' panicked at xplat/rust/toolchain/sysroot/1.86.0/library/std/src/thread/local.rs:276:25:
stderr: cannot access a Thread Local Storage value during or after destruction: AccessError
stderr: stack backtrace:
stderr:    0:     0x7f94abd4e48e - <<std[94ff872207e7c6e]::sys::backtrace::BacktraceLock>::print::DisplayBacktrace as core[3a29e1362dffb4e6]::fmt::Display>::fmt
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
stderr:    1:     0x7f94abd9bdc3 - core[3a29e1362dffb4e6]::fmt::write
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/fmt/rt.rs:177:76
stderr:    2:     0x7f94abd46fe9 - <std[94ff872207e7c6e]::sys::pal::unix::stdio::Stderr as std[94ff872207e7c6e]::io::Write>::write_fmt
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/io/mod.rs:1890:15
stderr:    3:     0x7f94abd4e333 - <std[94ff872207e7c6e]::sys::backtrace::BacktraceLock>::print
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/sys/backtrace.rs:42:9
stderr:    4:     0x7f94abd93aed - std[94ff872207e7c6e]::panicking::default_hook::{closure#0}
stderr:    5:     0x7f94abd93951 - std[94ff872207e7c6e]::panicking::default_hook
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:325:9
stderr:    6:     0x7f94abd94177 - std[94ff872207e7c6e]::panicking::rust_panic_with_hook
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:831:13
stderr:    7:     0x7f94abd4eb4e - std[94ff872207e7c6e]::panicking::begin_panic_handler::{closure#0}
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:704:13
stderr:    8:     0x7f94abd4e7e9 - std[94ff872207e7c6e]::sys::backtrace::__rust_end_short_backtrace::<std[94ff872207e7c6e]::panicking::begin_panic_handler::{closure#0}, !>
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/sys/backtrace.rs:168:18
stderr:    9:     0x7f94abd93baf - rust_begin_unwind
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/panicking.rs:695:5
stderr:   10:     0x7f94abdb9e10 - core[3a29e1362dffb4e6]::panicking::panic_fmt
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/panicking.rs:75:14
stderr:   11:     0x7f94abd2a523 - std[94ff872207e7c6e]::thread::local::panic_access_error
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/thread/local.rs:232:5
stderr:   12:     0x555b3e1fd956 - <std[94ff872207e7c6e]::thread::local::LocalKey<core[3a29e1362dffb4e6]::cell::RefCell<alloc[2f450b0663b6a2ca]::string::String>>>::with::<<tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr> as tracing_subscriber[bef8f40d1232f893]::layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry>>::on_event::{closure#0}, ()>
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/std/src/thread/local.rs:276:25
stderr:   13:     0x555b3e1f61f8 - <tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr> as tracing_subscriber[bef8f40d1232f893]::layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry>>::on_event
stderr:                                at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/fmt/fmt_layer.rs:949:9
stderr:   14:     0x555b3e1abca4 - <tracing_subscriber[bef8f40d1232f893]::layer::layered::Layered<tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr>, tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry> as tracing_core[48e5f9afc8996d3f]::subscriber::Subscriber>::event
stderr:                                at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/layer/layered.rs:153:9
stderr:   15:     0x555b3e1ac23c - <tracing_subscriber[bef8f40d1232f893]::layer::layered::Layered<tracing_subscriber[bef8f40d1232f893]::filter::env::EnvFilter, tracing_subscriber[bef8f40d1232f893]::layer::layered::Layered<tracing_subscriber[bef8f40d1232f893]::fmt::fmt_layer::Layer<tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry, tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, std[94ff872207e7c6e]::io::stdio::stderr>, tracing_subscriber[bef8f40d1232f893]::registry::sharded::Registry>> as tracing_core[48e5f9afc8996d3f]::subscriber::Subscriber>::event
stderr:                                at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/layer/layered.rs:152:9
stderr:   16:     0x555b3e1ba429 - <tracing_subscriber[bef8f40d1232f893]::fmt::Subscriber<tracing_subscriber[bef8f40d1232f893]::fmt::format::DefaultFields, tracing_subscriber[bef8f40d1232f893]::fmt::format::Format, tracing_subscriber[bef8f40d1232f893]::filter::env::EnvFilter, std[94ff872207e7c6e]::io::stdio::stderr> as tracing_core[48e5f9afc8996d3f]::subscriber::Subscriber>::event
stderr:                                at ./third-party/rust/vendor/tracing-subscriber-0.3.18/src/fmt/mod.rs:408:9
stderr:   17:     0x7f94a91e4ac2 - <tracing_core[48e5f9afc8996d3f]::dispatcher::Dispatch>::event
stderr:                                at ./third-party/rust/vendor/tracing-core-0.1.33/src/dispatcher.rs:613:13
stderr:   18:     0x7f94a91e158b - <tracing_core[48e5f9afc8996d3f]::event::Event>::dispatch::{closure#0}
stderr:                                at ./third-party/rust/vendor/tracing-core-0.1.33/src/event.rs:35:13
stderr:   19:     0x7f94a91e41c8 - tracing_core[48e5f9afc8996d3f]::dispatcher::get_default::<(), <tracing_core[48e5f9afc8996d3f]::event::Event>::dispatch::{closure#0}>
stderr:                                at ./third-party/rust/vendor/tracing-core-0.1.33/src/dispatcher.rs:389:16
stderr:   20:     0x7f94a91e1561 - <tracing_core[48e5f9afc8996d3f]::event::Event>::dispatch
stderr:                                at ./third-party/rust/vendor/tracing-core-0.1.33/src/event.rs:34:9
stderr:   21:     0x7f94a1344021 - <zenoh_shm[837e8891168f6622]::shm::unix::SegmentImpl<u64>>::is_dangling_segment::{closure#0}
stderr:                                at ./third-party/rust/vendor/tracing-0.1.41/src/macros.rs:875:17
stderr:   22:     0x7f94a13406cd - <zenoh_shm[837e8891168f6622]::shm::unix::SegmentImpl<u64>>::is_dangling_segment
stderr:                                at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/shm/unix.rs:241:13
stderr:   23:     0x7f94a1340191 - <zenoh_shm[837e8891168f6622]::shm::unix::SegmentImpl<u64>>::ensure_not_persistent
stderr:                                at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/shm/unix.rs:190:12
stderr:   24:     0x7f94a131b7c9 - <zenoh_shm[837e8891168f6622]::shm::Segment<u64>>::ensure_not_persistent
stderr:                                at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/shm/mod.rs:69:9
stderr:   25:     0x7f94a1360330 - zenoh_shm[837e8891168f6622]::posix_shm::cleanup::platform::cleanup_orphaned_segments_inner
stderr:                                at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/posix_shm/cleanup.rs:54:21
stderr:   26:     0x7f94a135fb3e - zenoh_shm[837e8891168f6622]::posix_shm::cleanup::platform::cleanup_orphaned_segments
stderr:                                at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/posix_shm/cleanup.rs:31:25
stderr:   27:     0x7f94a131fcd1 - <zenoh_shm[837e8891168f6622]::cleanup::Cleanup as core[3a29e1362dffb4e6]::ops::drop::Drop>::drop
stderr:                                at ./third-party/rust/vendor/zenoh-shm-1.3.2/src/cleanup.rs:66:9
stderr:   28:     0x7f94a136de25 - core[3a29e1362dffb4e6]::ptr::drop_in_place::<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/ptr/mod.rs:523:1
stderr:   29:     0x7f94a1371126 - <<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup> as static_init[d36ae8801d3dde6f]::Finaly>::finaly as core[3a29e1362dffb4e6]::ops::function::FnOnce<(&static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>,)>>::call_once
stderr:                                at ./xplat/rust/toolchain/sysroot/1.86.0/library/core/src/ptr/mut_ptr.rs:1492:18
stderr:   30:     0x7f94a13672e0 - static_init[d36ae8801d3dde6f]::lazy_sequentializer::lazy_finalization::<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>, static_init[d36ae8801d3dde6f]::phase_locker::sync::SyncPhaseGuard<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>>, <static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup> as static_init[d36ae8801d3dde6f]::Finaly>::finaly>
stderr:                                at ./third-party/rust/vendor/static_init-1.0.3/src/phase_locker/sync.rs:166:19
stderr:   31:     0x7f94a1380c60 - <static_init[d36ae8801d3dde6f]::generic_lazy::GenericLockedLazySeq<static_init[d36ae8801d3dde6f]::generic_lazy::DropedUnInited<zenoh_shm[837e8891168f6622]::cleanup::Cleanup>, static_init[d36ae8801d3dde6f]::exit_sequentializer::exit_manager::ExitSequentializer<zenoh_shm[837e8891168f6622]::cleanup::__StaticInitGeneratorFor_CLEANUP>> as static_init[d36ae8801d3dde6f]::exit_sequentializer::exit_manager::OnExit>::execute
stderr:                                at ./third-party/rust/vendor/static_init-1.0.3/src/lazy_sequentializer.rs:568:13
stderr:   32:     0x7f94a1d5f2fe - static_init[d36ae8801d3dde6f]::exit_sequentializer::exit_manager::reg::execute_at_exit
stderr:                                at ./third-party/rust/vendor/static_init-1.0.3/src/exit_sequentializer.rs:68:17
stderr:   33:     0x7f94ac279c73 - _dl_fini
stderr:                                at ./third-party2/glibc/2.34/src/glibc-2.34/elf/dl-fini.c:138:9
stderr:   34:     0x7f94abe478b8 - __run_exit_handlers
stderr:                                at ./third-party2/glibc/2.34/src/glibc-2.34/stdlib/exit.c:113:8
stderr:   35:     0x7f94abe479ca - __GI_exit
stderr:                                at ./third-party2/glibc/2.34/src/glibc-2.34/stdlib/exit.c:143:3
stderr:   36:     0x7f94abe2c65e - __libc_start_call_main
stderr:                                at ./third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:74:3
stderr:   37:     0x7f94abe2c718 - __libc_start_main_impl
stderr:                                at ./third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
stderr:   38:     0x555b3e1027a1 - _start
stderr:                                at ./third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116:0
stderr:   39:                0x0 - <unknown>
stderr: fatal runtime error: failed to initiate panic, error 5

To reproduce

  1. Run zenohd
  2. RUST_LOG=trace

System info

  • Platform: linux centos

alokpr avatar Apr 15 '25 05:04 alokpr

This is probably due logging during CLEANUP without threadlocal environment

alokpr avatar Apr 15 '25 05:04 alokpr

This is probably due logging during CLEANUP without threadlocal environment

Hi @alokpr, we have been aware of this thread local storage issue and don't suggest any shutdown during this stage. Unfortunately, I can't reproduce your problem using the steps below.

export RUST_LOG="trace"

# Use rust-toolchain 1.86.0
cargo run --bin zenohd

# Press Ctrl-C to terminate

I noticed that you have the shared memory feature enabled in the log. Can you show us the detailed steps to reproduce this issue?

YuanYuYuan avatar Apr 15 '25 07:04 YuanYuYuan

What do you mean by you "don't suggest any shutdown"? The panic happens automatically during process shutdown due to the usage of static_init::dynamic::drop here - https://github.com/eclipse-zenoh/zenoh/blob/037c5466d9bbfb22dfc019bb491c3fe10237a2e0/commons/zenoh-shm/src/cleanup.rs#L24

I will get you repro steps.

alokpr avatar Apr 16 '25 06:04 alokpr

What do you mean by you "don't suggest any shutdown"? The panic happens automatically during process shutdown due to the usage of static_init::dynamic::drop here -

zenoh/commons/zenoh-shm/src/cleanup.rs

Line 24 in 037c546 #[dynamic(lazy, drop)]

I will get you repro steps.

Hi @alokpr! I mean we don't suggest any shutdown of the async runtime during the exit stage or any similarly causing TLS issue. And I understand that this time is probably caused by our shared memory cleanup. We are happy to resolve this if you can provide some steps to reproduce it.

YuanYuYuan avatar Apr 16 '25 07:04 YuanYuYuan

Hi @alokpr, we've updated the logging procedure. Could you try this #1907?

YuanYuYuan avatar Apr 18 '25 07:04 YuanYuYuan

@YuanYuYuan Thanks for the quick response. It is a bit hard to test unreleased crates in our dev setup right now. I will see if I can repro the issue outside our project and test #1907

alokpr avatar Apr 18 '25 16:04 alokpr

Hello @alokpr, could you please confirm if it was fixed?

sashacmc avatar Oct 03 '25 19:10 sashacmc