jedi
jedi copied to clipboard
How to intercept or disable Jedi completely for specific objects/classes?
EDIT: sorry, I thought afterwards that this is an IPython because of a messed-up environment, but in fact, with jedi
uninstalled, the completion is fast, so this has to do with Jedi and recreated the issue at IPython, now it's closed and I am back here... It's too early 😬
I love Jedi but in some of my project where I rely on heavily dynamic data, it often chokes and makes the interactive experience totally unusable.
Unfortunately I don't have a minimal working example but at least one which you can try easily after doing this in a virtualenv:
pip install km3io km3net_testdata jedi==0.18.0
(in IPython)
>>> import km3io
>>> from km3net_testdata import data_path
>>> f = km3io.OfflineReader(data_path("offline/km3net_offline.root"))
>>> f.events.tracks.<TAB>
After hitting <TAB>
on f.events.tracks
, the whole interactive session is stuck for more than 15 seconds because apparently Jedi tries to figure out return types and whatnow which are dynamically generated and for some reason I think it triggers a lot of things behind the scenes.
Notice that this is a very small sample file, we usually deal with files with tons of GB and there the session needs to be killed.
I though that defining __dir__
will intercept this inspection (see https://github.com/KM3NeT/km3io/blob/4a56adea7224b5c806d703ff4d7a88037e1a02b9/km3io/rootio.py#L331), at least that would make sense, so I am wondering if this is a bug, intended or there is another way to prevent Jedi from doing anything?
I also tried to follow the code with the debugger but it's very difficult...
In any case, the main question is: is there a way to completely prevent Jedi from introspecting an object and simply accept a list of attributes to be displayed statically?
How long does dir(f.events.tracks)
take if you do not invoke Jedi? It would also be interesting if it's slower the second time you call it.
That's very fast:
>>> %time dir(f.events.tracks)
CPU times: user 3.29 ms, sys: 32 µs, total: 3.32 ms
Wall time: 3.37 ms
['E',
'dir_x',
'dir_y',
'dir_z',
'fitinf',
'id',
'len',
'lik',
'pos_x',
'pos_y',
'pos_z',
'rec_stages',
'rec_type',
't']
EDIT: it takes approximately the same time for consecutive calls (~3ms)
Maybe I forgot to mention but the main problem is also that hitting TAB takes the same time always. If it was just a 30sec hang for the very first invocation, we might be able to live with it but it's every time at least 30sec (and way longer for larger files).
Here, I have a very small sample file and call all the attributes from dir(f.events.tracks)
explicitly. As you can see, the first call takes ~250ms (it does some caching there) and the second ~66ms.
>>> %%time
... for attr in dir(f.events.tracks):
... value = getattr(f.events.tracks, attr)
... print(type(value), value)
...
<class 'awkward.highlevel.Array'> [[99.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[0.0369, 0.0369, 0.0369, 0.0351, 0.033, ... 0.242, -0.181, 0.409, 0.49, -0.296]]
<class 'awkward.highlevel.Array'> [[-0.487, -0.487, -0.487, -0.485, -0.491, ... 0.903, -0.638, 0.627, 0.447, -0.594]]
<class 'awkward.highlevel.Array'> [[-0.873, -0.873, -0.873, -0.874, -0.871, ... -0.749, -0.663, -0.749, -0.749]]
<class 'awkward.highlevel.Array'> [[[0.00496, 0.00342, -295, 142, 99.1, 1.8e+308, 4.24e-12, 10, ... [], [], [], []]]
<class 'awkward.highlevel.Array'> [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ... 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]]
<class 'awkward.highlevel.Array'> [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[295, 295, 295, 292, 291, 291, 289, ... 34.9, 34.9, 34.7, 33.9, 33.9, 33.8, 33.8]]
<class 'awkward.highlevel.Array'> [[446, 446, 448, 448, 448, 448, 448, 448, ... 448, 451, 453, 448, 455, 457, 447]]
<class 'awkward.highlevel.Array'> [[615, 615, 585, 583, 583, 583, 583, 583, ... 568, 567, 574, 568, 565, 568, 570]]
<class 'awkward.highlevel.Array'> [[125, 125, 70.7, 67.2, 68.4, 68.3, 68.3, ... 135, 135, 128, 134, 132, 127, 133]]
<class 'awkward.highlevel.Array'> [[[1, 3, 5, 4], [1, 3, 5], [1, 3], [1, 3], [1, ... 1], [1], [1], [1], [1], [1], [1]]]
<class 'awkward.highlevel.Array'> [[4000, 4000, 4000, 4000, 4000, 4000, 4000, ... 4000, 4000, 4000, 4000, 4000, 4000]]
<class 'awkward.highlevel.Array'> [[7.03e+07, 7.03e+07, 7.03e+07, 7.03e+07, ... 5.5e+07, 5.5e+07, 5.5e+07, 5.5e+07]]
CPU times: user 180 ms, sys: 12.4 ms, total: 192 ms
Wall time: 247 ms
>>> %%time
... for attr in dir(f.events.tracks):
... value = getattr(f.events.tracks, attr)
... print(type(value), value)
...
<class 'awkward.highlevel.Array'> [[99.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[0.0369, 0.0369, 0.0369, 0.0351, 0.033, ... 0.242, -0.181, 0.409, 0.49, -0.296]]
<class 'awkward.highlevel.Array'> [[-0.487, -0.487, -0.487, -0.485, -0.491, ... 0.903, -0.638, 0.627, 0.447, -0.594]]
<class 'awkward.highlevel.Array'> [[-0.873, -0.873, -0.873, -0.874, -0.871, ... -0.749, -0.663, -0.749, -0.749]]
<class 'awkward.highlevel.Array'> [[[0.00496, 0.00342, -295, 142, 99.1, 1.8e+308, 4.24e-12, 10, ... [], [], [], []]]
<class 'awkward.highlevel.Array'> [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ... 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]]
<class 'awkward.highlevel.Array'> [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[295, 295, 295, 292, 291, 291, 289, ... 34.9, 34.9, 34.7, 33.9, 33.9, 33.8, 33.8]]
<class 'awkward.highlevel.Array'> [[446, 446, 448, 448, 448, 448, 448, 448, ... 448, 451, 453, 448, 455, 457, 447]]
<class 'awkward.highlevel.Array'> [[615, 615, 585, 583, 583, 583, 583, 583, ... 568, 567, 574, 568, 565, 568, 570]]
<class 'awkward.highlevel.Array'> [[125, 125, 70.7, 67.2, 68.4, 68.3, 68.3, ... 135, 135, 128, 134, 132, 127, 133]]
<class 'awkward.highlevel.Array'> [[[1, 3, 5, 4], [1, 3, 5], [1, 3], [1, 3], [1, ... 1], [1], [1], [1], [1], [1], [1]]]
<class 'awkward.highlevel.Array'> [[4000, 4000, 4000, 4000, 4000, 4000, 4000, ... 4000, 4000, 4000, 4000, 4000, 4000]]
<class 'awkward.highlevel.Array'> [[7.03e+07, 7.03e+07, 7.03e+07, 7.03e+07, ... 5.5e+07, 5.5e+07, 5.5e+07, 5.5e+07]]
CPU times: user 60.3 ms, sys: 3.05 ms, total: 63.3 ms
Wall time: 66.2 ms
When invoking the Jedi autocompletion for f.events.tracks.
through IPython
, I get ~20sec and furthermore, you see that it's even 23sec for nested calls (e.g. for f.events.tracks.dir_x."
):
>>> import IPython
>>> ipy = IPython.get_ipython()
>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 17.7 s, sys: 2.03 s, total: 19.7 s
Wall time: 21 s
['f.events.tracks.dir_x',
'f.events.tracks.dir_y',
'f.events.tracks.dir_z',
'f.events.tracks.E',
'f.events.tracks.fitinf',
'f.events.tracks.id',
'f.events.tracks.len',
'f.events.tracks.lik',
'f.events.tracks.pos_x',
'f.events.tracks.pos_y',
'f.events.tracks.pos_z',
'f.events.tracks.rec_stages',
'f.events.tracks.rec_type',
'f.events.tracks.t']
>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 18.6 s, sys: 2.16 s, total: 20.7 s
Wall time: 22.6 s
['f.events.tracks.dir_x',
'f.events.tracks.dir_y',
'f.events.tracks.dir_z',
'f.events.tracks.E',
'f.events.tracks.fitinf',
'f.events.tracks.id',
'f.events.tracks.len',
'f.events.tracks.lik',
'f.events.tracks.pos_x',
'f.events.tracks.pos_y',
'f.events.tracks.pos_z',
'f.events.tracks.rec_stages',
'f.events.tracks.rec_type',
'f.events.tracks.t']
>>> %time ipy.Completer.all_completions("f.events.tracks.dir_x.")
CPU times: user 21.6 s, sys: 2.32 s, total: 23.9 s
Wall time: 26.4 s
[]
If I uninstall jedi
completely, I get 800ms for the first invoke and ~4ms for consecutive calls to the TAB completion:
>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 777 ms, sys: 21.2 ms, total: 798 ms
Wall time: 828 ms
['f.events.tracks.E',
'f.events.tracks.arrays',
'f.events.tracks.dir_x',
'f.events.tracks.dir_y',
'f.events.tracks.dir_z',
'f.events.tracks.fitinf',
'f.events.tracks.id',
'f.events.tracks.len',
'f.events.tracks.lik',
'f.events.tracks.ndim',
'f.events.tracks.pos_x',
'f.events.tracks.pos_y',
'f.events.tracks.pos_z',
'f.events.tracks.rec_stages',
'f.events.tracks.rec_type',
'f.events.tracks.t']
>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 4.28 ms, sys: 680 µs, total: 4.96 ms
Wall time: 4.49 ms
['f.events.tracks.E',
'f.events.tracks.arrays',
'f.events.tracks.dir_x',
'f.events.tracks.dir_y',
'f.events.tracks.dir_z',
'f.events.tracks.fitinf',
'f.events.tracks.id',
'f.events.tracks.len',
'f.events.tracks.lik',
'f.events.tracks.ndim',
'f.events.tracks.pos_x',
'f.events.tracks.pos_y',
'f.events.tracks.pos_z',
'f.events.tracks.rec_stages',
'f.events.tracks.rec_type',
'f.events.tracks.t']
Since the attributes in this case are all awkward.highlevel.Array
, I think it might be a good idea to ping @jpivarski (the author of awkward
), maybe he has similar experiences or some idea?
Can you maybe call jedi.set_debug_function()
and post the output here?
Yes, it's quite huge... jedi.log.zip
ak.Array
(in ak.highlevel
) has an explicit __dir__
method that looks up the fields of record arrays such as these.
https://github.com/scikit-hep/awkward-1.0/blob/5aaf42a60a49a3643e0466d25dcddfe7e6aa395c/src/awkward/highlevel.py#L1123-L1137
It doesn't have an explicit _ipython_key_completions_
method: https://github.com/scikit-hep/awkward-1.0/search?q=ipython_key_completions
I would have thought that IPython would simply call __dir__
if it needs tab-completions, but maybe it's not? The difference between milliseconds and 20 seconds sounds like it's loading data, which is not what you want. If adding an explicit _ipython_key_completions_
method (simply calling __dir__
) fixes this, I'm (a) surprised at IPython and (b) willing to accept it as a PR.
@jpivarski Jedi is not using _ipython_key_completions_. That method is only there for people that use
use_jedi=False` (which is not the default.
The problem is probably that Jedi uses getattr
to find the types of all completion results.
@tamasgal Could you maybe try running this and report back?
>>> %time [getattr(f.events.tracks, n, None) for n in dir(f.events.tracks)]
Yes, sure! (btw. the code is runnable also for you since the test file comes packaged if you do pip install km3io km3net_testdata
)
Here it is, looks fast enough for me:
>>> import km3io
>>> from km3net_testdata import data_path
>>> f = km3io.OfflineReader(data_path("offline/km3net_offline.root"))
>>> %time [getattr(f.events.tracks, n, None) for n in dir(f.events.tracks)]
CPU times: user 84.8 ms, sys: 308 µs, total: 85.1 ms
Wall time: 97.2 ms
[<Array [[99.1, 0, 0, 0, 0, ... 0, 0, 0, 0, 0]] type='10 * var * float64'>,
<Array [[0.0369, 0.0369, ... 0.49, -0.296]] type='10 * var * float64'>,
<Array [[-0.487, -0.487, ... 0.447, -0.594]] type='10 * var * float64'>,
<Array [[-0.873, -0.873, ... -0.749, -0.749]] type='10 * var * float64'>,
<Array [[[0.00496, 0.00342, -295, ... [], []]] type='10 * var * var * float64'>,
<Array [[1, 2, 3, 4, 5, ... 53, 54, 55, 56]] type='10 * var * int32'>,
<Array [[0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0]] type='10 * var * float64'>,
<Array [[295, 295, 295, ... 33.9, 33.8, 33.8]] type='10 * var * float64'>,
<Array [[446, 446, 448, ... 455, 457, 447]] type='10 * var * float64'>,
<Array [[615, 615, 585, ... 565, 568, 570]] type='10 * var * float64'>,
<Array [[125, 125, 70.7, ... 132, 127, 133]] type='10 * var * float64'>,
<Array [[[1, 3, 5, 4], [1, ... 1], [1], [1]]] type='10 * var * var * int64'>,
<Array [[4000, 4000, 4000, ... 4000, 4000]] type='10 * var * int32'>,
<Array [[7.03e+07, 7.03e+07, ... 5.5e+07]] type='10 * var * float64'>]