py-spy icon indicating copy to clipboard operation
py-spy copied to clipboard

Multiprocessing Deadlock with v0.4.0

Open mlucool opened this issue 1 year ago • 4 comments

Hi,

The following causes py-spy to hang randomly, both with --subprocess and without. When we looked at the process, it seems like one thread is waiting to to recv from a channel.

In this reproducer, you'll see that one of the samples will cause it to hang forever.

$ cat py_spy_crash.py
#!/usr/bin/env python3
from concurrent.futures import ProcessPoolExecutor
from scipy.signal import lfilter
for ix in range(2):
    with ProcessPoolExecutor(1) as executor:
        executor.submit(lfilter, [1.0], [1.0], [1.0]).result()
    print(f"done with {ix=}", flush=True)
$ for i in `seq 100`; do time -p py-spy record -o /dev/null --subprocesses -- ./py_spy_crash.py; done

You can also run it without --subprocesses and see that this happens.

$ for i in `seq 100`; do time -p py-spy record -o /dev/null -- ./py_spy_crash.py; done

mlucool avatar Dec 04 '24 00:12 mlucool

I believe I'm also seeing this, although for me the multi-processing happens within a native extension. Is there any workaround?

wbthomason avatar Apr 08 '25 21:04 wbthomason

Hey all, I'm interested in working on this. I think this is related to the locking strategy used by py-spy, which involves checking the state of each thread before locking it. Looking at the call stack when this hangs, it seems like self.process.lock() never returns, and instead waitpid is left waiting for the state of the thread to change (after the thread has already died).

One thing we could do is run the check-before-lock pattern in a separate thread. If it takes too long we simply time out, send an interrupt signal to kill the check-before-lock thread, and discard those stack traces - which is what happens anyway if a single thread dies between entering the loop and reading its status.

I'll work on putting together a PR unless maintainers prefer some other approach?

peytondmurray avatar Aug 13 '25 15:08 peytondmurray

Hi, Can we get an update on the fix here?

singharpit94 avatar Sep 09 '25 02:09 singharpit94

@singharpit94 Thanks for the reminder about this - can you try out https://github.com/benfred/py-spy/pull/802 and see if that works for you?

Clone the repo locally, check out the branch, and install with pip via pip install .. Alternatively if it is easier you can do pip install git+https://github.com/peytondmurray/py-spy@732-timeout-acquire-lock to install from my branch directly.

peytondmurray avatar Sep 10 '25 06:09 peytondmurray