pystack icon indicating copy to clipboard operation
pystack copied to clipboard

Failure to unwind stack in an after-fork handler

Open godlygeek opened this issue 2 years ago • 1 comments

After causing a deadlock in an after-fork handler installed with pthread_atfork, an attempt to unwind that process with pystack remote --native-all is giving me

Engine error: basic_string::_S_construct null not valid

That's happening because dwfl_getthread_frames is not finding any frames, and also not setting dwfl_errno to something non-zero. Interestingly, this doesn't seem to reproduce with eu-stack, so we might be doing something wrong here that's causing this.

#101 fixes the failure mode that we get here, but we should figure out why unwinding is failing, as both gdb and eu-stack succeed.

godlygeek avatar Jun 07 '23 03:06 godlygeek

Oof, I see what we've got wrong:

$ pystack -v remote --native-all 6611 2>&1 | egrep 'tid|thread'
INFO(process_remote): Trying to stop thread 6611
INFO(process_remote): Waiting for thread 6611 to be stopped
INFO(process_remote): Fetching Python threads
INFO(process_remote): Constructing new Python thread with tid 6610
INFO(process_remote): Detaching from thread 6611

That 6610 on the 2nd to last line is the parent process's pid/tid, not the child process's. Since we're inside of fork at this point, we seem to be finding a structure somewhere that still holds the old pid/tid, rather than the new one that we've got after fork. And then unwinding is failing because we're asking libdw to unwind a thread that doesn't exist in this process.

godlygeek avatar Jun 07 '23 03:06 godlygeek