py-spy icon indicating copy to clipboard operation
py-spy copied to clipboard

Cross-container --native panics: called `Option::unwrap()` on a `None` value', src/libcore/option.rs:378:21

Open evanj opened this issue 5 years ago • 5 comments

Using py-spy 0.3.2 release binary either musl or gnu panics with the following panic backtrace when adding the --native flag to either dump or record a PID running in a different container. Without --native works fine.

Works

  • ./py-spy dump --pid=(OTHER CONTAINER) (correctly shows Python stacks)
  • ./py-spy dump --pid=(SAME CONTAINER) --native (correctly shows Python + native code stacks)
  • gdb attach (OTHER CONTAINER) (shows

Crash

# RUST_BACKTRACE=full ./t/py-spy dump --pid=3511964 --native
Process 3511964: /venv2/bin/python2.7 -m staticbatch.worker --topic-name=gke-email-shadow-test --max-pull-messages=10 --max-workers=10 --sendgrid-sandbox-mode=true --use-datadog-client=true --use-recipe-rules-client=true --use-product-service=true --disable-mtbe=true
Python v2.7.16 (/venv2/bin/python2.7)

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:378:21
stack backtrace:
   0:     0x555c589efa74 - backtrace::backtrace::libunwind::trace::h65597d255cb1398b
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x555c589efa74 - backtrace::backtrace::trace_unsynchronized::hd4f479d7150ec4a0
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:     0x555c589efa74 - std::sys_common::backtrace::_print_fmt::h015072984a2b172c
                               at src/libstd/sys_common/backtrace.rs:77
   3:     0x555c589efa74 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h6df05d3335f32194
                               at src/libstd/sys_common/backtrace.rs:61
   4:     0x555c58a15f6c - core::fmt::write::h1f444f4312eb6c27
                               at src/libcore/fmt/mod.rs:1028
   5:     0x555c589ece97 - std::io::Write::write_fmt::h8d147888220078ef
                               at src/libstd/io/mod.rs:1412
   6:     0x555c589f1fee - std::sys_common::backtrace::_print::h8a6df0fa81d6af62
                               at src/libstd/sys_common/backtrace.rs:65
   7:     0x555c589f1fee - std::sys_common::backtrace::print::h6f05b4733407e509
                               at src/libstd/sys_common/backtrace.rs:50
   8:     0x555c589f1fee - std::panicking::default_hook::{{closure}}::h0d0a23bd02315dd8
                               at src/libstd/panicking.rs:188
   9:     0x555c589f1ce1 - std::panicking::default_hook::h8d15a9aecb4efac6
                               at src/libstd/panicking.rs:205
  10:     0x555c589f26eb - std::panicking::rust_panic_with_hook::hbe174577402a475d
                               at src/libstd/panicking.rs:464
  11:     0x555c589f228e - std::panicking::continue_panic_fmt::h4d855dad868accf3
                               at src/libstd/panicking.rs:373
  12:     0x555c589f2176 - rust_begin_unwind
                               at src/libstd/panicking.rs:302
  13:     0x555c58a11d5e - core::panicking::panic_fmt::hdeb7979ab6591473
                               at src/libcore/panicking.rs:139
  14:     0x555c58a11caa - core::panicking::panic::hb5daa85c7c72fc62
                               at src/libcore/panicking.rs:70
  15:     0x555c587b6e4c - py_spy::python_spy::PythonSpy::get_stack_traces::hf1034ae8c547c3fe
  16:     0x555c587a5ade - py_spy::dump::print_traces::ha3b7ff6b5d3dc615
  17:     0x555c58774ce9 - py_spy::run_spy_command::h9e10a2ee06576172
  18:     0x555c587781ae - py_spy::main::h65a100b9d2adcb81
  19:     0x555c58795853 - std::rt::lang_start::{{closure}}::hc7c20c46ab3745aa
  20:     0x555c589f2113 - std::rt::lang_start_internal::{{closure}}::h6ea535ec5c50fc3e
                               at src/libstd/rt.rs:48
  21:     0x555c589f2113 - std::panicking::try::do_call::h631c6408dfccc6f5
                               at src/libstd/panicking.rs:287
  22:     0x555c589fa01a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  23:     0x555c589f2bcd - std::panicking::try::hab539b2d1255d635
                               at src/libstd/panicking.rs:265
  24:     0x555c589f2bcd - std::panic::catch_unwind::hd5e0a26424bd7f34
                               at src/libstd/panic.rs:396
  25:     0x555c589f2bcd - std::rt::lang_start_internal::h3bdc4c7d98181bf9
                               at src/libstd/rt.rs:47
  26:     0x555c58779d72 - main
  27:     0x7f831245d2e1 - __libc_start_main
  28:     0x555c5870f3c9 - _start
  29:                0x0 - <unknown>

Version Details

$ ./py-spy --version
py-spy 0.3.2

I'm running this on Google "Container OS" instance (running on a Google Kubernetes Engine cluster). My py-spy process is in a privileged container (the "toolbox" environment), and I'm trying to dump an unprivileged process in another container. These containers are almost certainly using different library versions etc, which I know sometimes causes debugging problems.

gdb output

GDB manages to do the right thing but prints warnings:

warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib/x86_64-linux-gnu/libpthread.so.0.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib/x86_64-linux-gnu/libpthread.so.0.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

evanj avatar Jan 15 '20 20:01 evanj

I'm encountering a very similar error when trying to profile subprocesses. I'm running on Fedora, without any sorts of containers.

The command I used to run py-spy is RUST_BACKTRACE=full py-spy record -o profile -f speedscope -s -- python -m <my module>. The error does not occur when I omit the -s flag, or when I simply do not spawn any subprocesses in my Python program. The error happened for each of the 0.3.x versions. I'm using Python 3.8.1 and joblib with its default loky backend to spawn subprocesses.

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:378:21
stack backtrace:
   0:           0x6f2324 - backtrace::backtrace::libunwind::trace::h0d3b97a6b64193be
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:           0x6f2324 - backtrace::backtrace::trace_unsynchronized::hb78a971d8547111e
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:           0x6f2324 - std::sys_common::backtrace::_print_fmt::ha1f41287e6ef2374
                               at src/libstd/sys_common/backtrace.rs:77
   3:           0x6f2324 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1130808f9197ccb5
                               at src/libstd/sys_common/backtrace.rs:61
   4:           0x72542c - core::fmt::write::h4fa8b44d73117031
                               at src/libcore/fmt/mod.rs:1028
   5:           0x6ef747 - std::io::Write::write_fmt::hc08a67176e074177
                               at src/libstd/io/mod.rs:1412
   6:           0x6f489e - std::sys_common::backtrace::_print::h33a300037b4dc1ed
                               at src/libstd/sys_common/backtrace.rs:65
   7:           0x6f489e - std::sys_common::backtrace::print::h2ceebd24f74a6808
                               at src/libstd/sys_common/backtrace.rs:50
   8:           0x6f489e - std::panicking::default_hook::{{closure}}::h599b8688602f475a
                               at src/libstd/panicking.rs:188
   9:           0x6f4591 - std::panicking::default_hook::h0085905ac97017e0
                               at src/libstd/panicking.rs:205
  10:           0x6f4f9b - std::panicking::rust_panic_with_hook::h780f8e9bbe4fc091
                               at src/libstd/panicking.rs:464
  11:           0x6f4b3e - std::panicking::continue_panic_fmt::h1444a364e5f24a1a
                               at src/libstd/panicking.rs:373
  12:           0x6f4a26 - rust_begin_unwind
                               at src/libstd/panicking.rs:302
  13:           0x72121e - core::panicking::panic_fmt::h760360c16b67cc83
                               at src/libcore/panicking.rs:139
  14:           0x72116a - core::panicking::panic::ha556f2c3753f3759
                               at src/libcore/panicking.rs:70
  15:           0x4694d6 - py_spy::run_spy_command::he9e192defef1e373
  16:           0x46a82f - py_spy::main::he2a48b4fd3934d5e
  17:           0x487343 - std::rt::lang_start::{{closure}}::h47dbd41d95b3d679
  18:           0x6f49c3 - std::rt::lang_start_internal::{{closure}}::hdb12a3ead60a8960
                               at src/libstd/rt.rs:48
  19:           0x6f49c3 - std::panicking::try::do_call::h75828266d21559c3
                               at src/libstd/panicking.rs:287
  20:           0x6fba4a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  21:           0x6f531b - std::panicking::try::h2533a62d10a8f001
                               at src/libstd/panicking.rs:265
  22:           0x6f531b - std::panic::catch_unwind::hd9ecf47e7cd9602d
                               at src/libstd/panic.rs:396
  23:           0x6f531b - std::rt::lang_start_internal::h93ee4050c8419278
                               at src/libstd/rt.rs:47
  24:           0x46b7f2 - main
[2020-01-18T14:54:32.595985592Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596018311Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596020850Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596023870Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596029040Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596035250Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596040449Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596048259Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596053479Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596059928Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596065998Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596071658Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596073807Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596079167Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596084807Z ERROR py_spy::sampler] Failed to acquire lock: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596101556Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596104356Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596106575Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596108865Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.596112185Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.618826380Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside
[2020-01-18T14:54:32.628870934Z ERROR py_spy::sampler] Failed to get process tree: poisoned lock: another task failed inside

timwie avatar Jan 18 '20 15:01 timwie

Same thing with subprocesses! :(

s-ilyas avatar Jan 21 '20 19:01 s-ilyas

We're using libunwind-ptrace to get the native component of the stack trace - and it doesn't look like its working across containers =( In the short term I'll add some code to warn that cross-container native profiling isn't supported yet - but to fix properly will probably involve diving into the libunwind code and will take some time

benfred avatar Feb 09 '20 18:02 benfred

Hi @benfred, just curious if there are any updates regarding cross-container native profiling? I also run into Failed to get process tree: poisoned lock: another task failed inside error with the --subprocesses and --native flags. Thank you!

atskae avatar Oct 09 '20 16:10 atskae

👋 Hello! Is this still an active issue?

metasyn avatar Feb 01 '24 00:02 metasyn