Tests for debugging imports
Encoding some expected behavior in tests.
CC @echarles
Thx for working on this @vidartf Quick headup that CI is failing with the current state of this PR.
=========================== short test summary info ============================
FAILED ipykernel/tests/test_debugger.py::test_step_into_lib[kernel0] - Assert...
FAILED ipykernel/tests/test_debugger.py::test_step_into_end - Failed: DID NOT...
Yes. My statement is that master has a bug 😉
@fabioz: Do you have any input on my dirty monkey patching of your very nice library? 😅
So, this is an initial draft for filtering frames (proof of concept if you'd like):
Monkey patches pydevd's global debugger instance so that frames from __tracebackhide__ == "__ipython_bottom__" and below are excluded from both stack traces and tracing (aka stepping). This should prevent users stepping out into the IPython/ipykernel internals between statements, and after cells.
Points of note and/or discussion:
- Doesn't yet handle display hooks I think.
- Path/module filters are not sufficient, as the same file/frames can appear on both sides of the IPython stack "bottom".
- For the same reason, caching of the filter can not be used.
- Monkey patching is not ideal, especially since we also access some internals that are probably not considered part of the public API.
- Are threads / forking important here?
- Note that using
traitlets.validate()for the test is useful, since it also shows up below the "bottom" of the IPython stack (at the time of writing). Maybe using a more central IPython function would be more reliable?
Regarding the monkey-patching, I'm a bit wary of it as it's bound to break as the debugger moves forward (if this functionality is really needed, this should probably be in the debugger itself).
get_file_type is also pretty important performance-wise, so, this change will probably make the debugger significantly slower if the results can't be cached since you're covering for a use-case such as:
-> bottom_frame (ignore)
-> frame: func_a (ignore)
-> __tracebackhide__ == "__ipython_bottom__" (ignore)
-> frame: func_a (show)
-> frame: func_b (show)
And going up the stack every time the file type is needed is bound to be slow...
Is this use-case common where a frame from code func_a would appear both before and after the __ipython_bottom__ (is it really worthy to support it)? Can you provide the actual use case where this is needed?
@fabioz Thanks for the input. As mentioned, this is mainly proof-of-concept at this stage, and anything that can be done to clean it up will be very welcome. Making it not rely on internal implementation details of pydevd is definitely one of these things.
Is this use-case common where a frame from code
func_awould appear both before and after the__ipython_bottom__(is it really worthy to support it)? Can you provide the actual use case where this is needed?
Anything that IPython or ipykernel uses or calls out to while processing its message loop are things to consider. Some packages/libs I've observed in logs include: traitlets, asyncio, concurrent, zmq, threading, iostream, jupyter_client, json, hmac, date_utils, logging, queue, tornado, selectors, enum, signal, inspect, tokenize, contextlib, os, tempfile, codeop, dis, decorator, etc. These were only the ones I could find from quickly scanning the logs after running the tests here. I'm sure there will be others as well. Not being able to debug these (or where the ability to trace these are based on a race) seems a deal-breaker. That said, the goal would of course be to:
- Ensure that no performance degradation occurs for the cases where the debug target is not an ipykernel, or where the user does not want to debug libraries (just_my_code = True).
- Find more optimized ways of implementing this. E.g. for
set_trace_for_frame_and_parents, if we couldbreakthe loop on the__ipython_bottom__frame, we would avoid enumerating those frames at all. If we can do the same for other things that loop over frames for filtering, we could avoid the need to touchget_file_typeat all. However, at the moment I'm not familiar enough with the pydevd source to make any good calls about that, so I brute-forced it inget_file_typesince it is a central location for most things that filter frames. Any input from you would be very helpful in figuring out what would be possible here!
Not being able to debug these (or where the ability to trace these are based on a race) seems a deal-breaker.
Additionally: If these frames show up above the "bottom" frame for the first time, than a step command will suddenly trace out into these outer frames, or they will confusingly show up in the stack trace. These behaviors might be much more detrimental to the user experience than not being able to trace these libs.
Humm, if you know the points when you want to start/stop tracing and want absolute control, maybe using get_global_debugger().enable_tracing() / get_global_debugger().disable_tracing() could be a better approach...
I couldn't find any reference to __ipython_bottom__ (apart from the usage where it's checked in the debugger). Where would that be defined?
p.s.: the enable/disable tracing would still leave items showing in the stack frames, but those could be removed in the wrapper that sends the info to the client.
Interesting. I can see a context manager applying this before calling into user code. Some questions/concerns come to mind though:
- Would we need to be careful to reapply the same tracers? I.e. would the trace state need to be stored between calls to
disableand the nextenable? Would there be concerns of this going out of sync? Also, are there not race concerns with debug messages coming in between the code executions (i.e. the trace will be enabled while below the "bottom" frame)? - IPython have some support for async code in cells. I'm not very familiar with this, but I'm concerned whether this could mean that we might end up in the IPython loop again while awaiting something in the cell? I would have to look into this. I also don't know how the stack would look like at that point.
I couldn't find any reference to
__ipython_bottom__(apart from the usage where it's checked in the debugger). Where would that be defined?
https://github.com/ipython/ipython/blob/de8729a984272b023768a8b85295ae06146da71d/IPython/core/interactiveshell.py#L3347
The main place that __ipython_bottom__ is used is in the PDB extension for IPython: https://github.com/ipython/ipython/blob/de8729a984272b023768a8b85295ae06146da71d/IPython/core/debugger.py
- Would we need to be careful to reapply the same tracers? I.e. would the trace state need to be stored between calls to
disableand the nextenable? Would there be concerns of this going out of sync? Also, are there not race concerns with debug messages coming in between the code executions (i.e. the trace will be enabled while below the "bottom" frame)?
This shouldn't be a problem. The tracing enabling/disabling is supported by the debugger and it should make sure everything works properly (if there's any issue that'd be a bug to be fixed).
- IPython have some support for async code in cells. I'm not very familiar with this, but I'm concerned whether this could mean that we might end up in the IPython loop again while awaiting something in the cell? I would have to look into this. I also don't know how the stack would look like at that point.
The async is indeed trickier because whenever you enter the async part you want to enable the debugger and when you get back to the loop you want to disable it (this may be a blocker... I'm not sure if there's a way to implement that).
After thinking a bit more about it I think that your original approach of having a root frame from which the debugging happens (relying on __tracebackhide__ == "__ipython_bottom__") may really be the best approach and it does cover for the asynchronous use case properly.
I have to thinker a bit more about how to implement that efficiently though...
@vidartf I've added some thoughts on the plans to tackle this in the debugger (see: https://github.com/microsoft/debugpy/issues/869#issuecomment-1119864916).
After I finish tackling that issue, I'll give you a heads up so that you can update this PR to make use of those new arguments in the attach/launch requests.