py-spy icon indicating copy to clipboard operation
py-spy copied to clipboard

Errors sampling process using os.exec*

Open hermancollin opened this issue 3 years ago • 5 comments

Following my comment on #243, I explained how to reproduce the Error: No child process (os error 10) issue. I digged a bit deeper and realized most of the samples from my traces were unusable because a large amount of samples disappear. When using py-spy record over the programs I am profiling, I encountered this issue with basically every SCT python script invokation and the profiler outputted something like this at the end:

py-spy> Wrote flamegraph data to 'test.svg'. Samples: 15 Errors: 629

Obviously, most of the samples are lost in these cases. However, I never encountered this when running py-spy record directly over a .py file. These bad traces only happen when I am profiling a script that takes advantage of Python's console-scripts entry point, in the setup.py file.

hermancollin avatar Jun 09 '21 22:06 hermancollin

Using RUST_LOG=warn:

» RUST_LOG=warn py-spy record -o test.svg -- sct_check_dependencies -short                  1 ↵ herman@tank
[2021-06-09T22:05:48.102470115Z WARN  py_spy::python_spy] Interpreter address from interp_head symbol is invalid 000055f771de26c0
py-spy> Sampling process 100 times a second. Press Control-C to exit.

[2021-06-09T22:05:48.253545145Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.254633903Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.258845533Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.272319643Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.272779210Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.278885932Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.309055502Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.333192050Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.341197105Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.361254618Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.364358560Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.385848214Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.389786467Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.404397293Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.409291473Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.409513969Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.430549140Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.449357225Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.449765453Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.453872970Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.470404562Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.472034581Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.472985324Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.488131601Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.488350670Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.494125840Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)

--
Spinal Cord Toolbox (git-master-a685d7f0d8032f9df685a08d22947947ac416e71*)

sct_check_dependencies -short
--

[2021-06-09T22:05:48.501538400Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.502834813Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
SCT info:
- version: git-master-a685d7f0d8032f9df685a08d22947947ac416e71*
- path: /home/herman/Documents/NEUROPOLY_21/spinalcordtoolbox
[2021-06-09T22:05:48.523265225Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
OS: linux (Linux-5.4.106-rt54-MANJARO-x86_64-with-arch-Manjaro-Linux)
CPU cores: Available: 4, Used by ITK functions: 4
RAM: Total: 15895MB, Used: 5543MB, Available: 5143MB
[2021-06-09T22:05:48.526914090Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)
[2021-06-09T22:05:48.528860746Z WARN  py_spy] Failed to get stack trace from 269800: Bad address (os error 14)

py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'test.svg'. Samples: 2 Errors: 31

hermancollin avatar Jun 09 '21 22:06 hermancollin

I think this is what's happening here: the scripts I am profiling are executed in the following way:

  1. I call the CLI command, e.g. tool_from_a_toolbox
  2. The entry point for the tool_from_a_toolbox script is a launcher.py file common to all tools from this particular toolbox.
  3. This launcher.py script calls os.execvpe(), which takes as input the program you want to execute, its arguments and finally the environment parameter. This function executes a new program and replaces the current process without returning. From os.execvpe doc:

For execle(), execlpe(), execve(), and execvpe() (note that these all end in “e”), the env parameter must be a mapping which is used to define the environment variables for the new process (these are used instead of the current process’ environment); the functions execl(), execlp(), execv(), and execvp() all cause the new process to inherit the environment of the current process.

Maybe py-spy has some trouble attaching to this new process? I should mention the same issue happens when also tracking subprocesses.

hermancollin avatar Jun 09 '21 22:06 hermancollin

Updating this issue for anyone who would encounter the same problem (trying to profile a python software built with a single launcher for every entry points).

The way to solve this is straightforward. Because py-spy record can't attach to the new process generated by os.execvpe() (or any os.exec() variants), we can directly insert the profiler invokation within the function arguments, e.g. if we had the following:

os.execvpe(cmd[0], cmd[0:], env)

simply do this:

cmd = [PATH_TO_PY_SPY] + PY_SPY_ARGS.split() + cmd
os.execvpe(cmd[0], cmd[0:], env)

This works like a charm. On my end this is solved.

hermancollin avatar Jun 10 '21 16:06 hermancollin

This looks like a bug - I think the problem is that we're getting python interpreter information from the python process when py-spy starts (including addresses etc) and those all change after the os.exec call.

I can replicate it with a script like

import os
import sys
import time

if __name__ == "__main__":
    # give py-spy a chance to connect
    time.sleep(1)

    # replace this process with another python interpreter
    long_sleep_path = os.path.join(os.path.realpath(os.path.dirname(__file__)), "longsleep.py")
    os.execvp(sys.executable, [sys.executable, long_sleep_path])

(in the tests directory).

Another workaround is probably to launch the process without py-spy, and then connect to the PID once the python process is started (which isn't ideal).

benfred avatar Jun 10 '21 18:06 benfred

@benfred FWI apparently, on POSIX (but not on Windows), the constructor for the subprocess module in Python uses os.execvp()-like behavior (reference), so this issue might be more common than I initially thought.

hermancollin avatar Jun 11 '21 12:06 hermancollin