pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

Support `--pdb` again when `--pdbcls` is also specified

Open ralokt opened this issue 2 years ago • 5 comments

This ties in with #863 - debugging isn't made impossible by xdist, it's just that default PDB doesn't work. Remote debuggers do, and it would be nice if

  • Just --pdb resulted in an error (but also a nudge to use a remote debugger and specify it using --pdbcls)
  • --pdb in combination with --pdbcls did not result in an error anymore

ralokt avatar Jan 10 '23 13:01 ralokt

Unless you provide an example that works im inclined to close this as not supportable

This isn't just about the class, but about how it is integrated

RonnyPfannschmidt avatar Jan 10 '23 16:01 RonnyPfannschmidt

I am now using python-web-pdb, and the following work out of the box:

  • pytest --pdb --pdbcls=web_pdb:WebPdb <my test> (ie no xdist)
  • import web_pdb; web_pdb.set_trace() in my test, with xdist

ralokt avatar Jan 10 '23 16:01 ralokt

When I remove the two places where xdist currently disables usepdb, and combine the two approaches, it crashes because the debugger logs that it is starting up. However - I can tell you this because I can actually debug this failure from web_pdb!

image

That seems like an eminently solvable problem?

ralokt avatar Jan 10 '23 16:01 ralokt

I approve of the notion,

I'd like to see a new option to allow xdist pdb in combination with a pdb class

What happens when multiple processes hit pdb and what happens when a ssh gateway is used?

RonnyPfannschmidt avatar Jan 10 '23 17:01 RonnyPfannschmidt

Re SSH gateway: I have no idea, never used that feature, but I see no reason for it to be much different. The idea to use a channel for communication with the debugger that isn't also used for IPC should work just as well, in theory - unless perhaps there is some application-level timeout at play.

Re multiple processes failing, I see two likely possibilities depending on the debugger:

  • The first test to fail would bind to the default port used by the debugger in question, subsequent failing tests would fail to bind to that port and those workers would crash
  • All workers would bind to a different random port, but then knowing what those ports are might be an issue if it turns out that we can't let the debuggers do I/O

As someone who has spent hours of desperation finding remote debugging as a valid approach (because of a heisenbug that only happens when xdist is used to distribute tests, and only on that one machine with 128 physical cores where we use 128 workers), either of these would still have been a massive improvement over the status quo.

End of the day, 99% of the time it's better to use xdist to find failures, and then re-run those tests without for debugging - but for those cases where that isn't enough, and you actually need to see what is happening in a worker - even an experimental solution with huge caveats is preferable to no support at all. At the very least I would like the option to shoot myself in the foot here.

ralokt avatar Jan 11 '23 09:01 ralokt