jedi icon indicating copy to clipboard operation
jedi copied to clipboard

How to intercept or disable Jedi completely for specific objects/classes?

Open tamasgal opened this issue 3 years ago • 9 comments

EDIT: sorry, I thought afterwards that this is an IPython because of a messed-up environment, but in fact, with jedi uninstalled, the completion is fast, so this has to do with Jedi and recreated the issue at IPython, now it's closed and I am back here... It's too early 😬

I love Jedi but in some of my project where I rely on heavily dynamic data, it often chokes and makes the interactive experience totally unusable.

Unfortunately I don't have a minimal working example but at least one which you can try easily after doing this in a virtualenv:

pip install km3io km3net_testdata jedi==0.18.0

(in IPython)

>>> import km3io
>>> from km3net_testdata import data_path
>>> f = km3io.OfflineReader(data_path("offline/km3net_offline.root"))
>>> f.events.tracks.<TAB>

After hitting <TAB> on f.events.tracks, the whole interactive session is stuck for more than 15 seconds because apparently Jedi tries to figure out return types and whatnow which are dynamically generated and for some reason I think it triggers a lot of things behind the scenes.

Notice that this is a very small sample file, we usually deal with files with tons of GB and there the session needs to be killed.

I though that defining __dir__ will intercept this inspection (see https://github.com/KM3NeT/km3io/blob/4a56adea7224b5c806d703ff4d7a88037e1a02b9/km3io/rootio.py#L331), at least that would make sense, so I am wondering if this is a bug, intended or there is another way to prevent Jedi from doing anything?

I also tried to follow the code with the debugger but it's very difficult...

In any case, the main question is: is there a way to completely prevent Jedi from introspecting an object and simply accept a list of attributes to be displayed statically?

tamasgal avatar May 04 '21 07:05 tamasgal

How long does dir(f.events.tracks) take if you do not invoke Jedi? It would also be interesting if it's slower the second time you call it.

davidhalter avatar May 07 '21 09:05 davidhalter

That's very fast:

>>> %time dir(f.events.tracks)
CPU times: user 3.29 ms, sys: 32 µs, total: 3.32 ms
Wall time: 3.37 ms
['E',
 'dir_x',
 'dir_y',
 'dir_z',
 'fitinf',
 'id',
 'len',
 'lik',
 'pos_x',
 'pos_y',
 'pos_z',
 'rec_stages',
 'rec_type',
 't']

EDIT: it takes approximately the same time for consecutive calls (~3ms)

tamasgal avatar May 07 '21 09:05 tamasgal

Maybe I forgot to mention but the main problem is also that hitting TAB takes the same time always. If it was just a 30sec hang for the very first invocation, we might be able to live with it but it's every time at least 30sec (and way longer for larger files).

tamasgal avatar May 07 '21 09:05 tamasgal

Here, I have a very small sample file and call all the attributes from dir(f.events.tracks) explicitly. As you can see, the first call takes ~250ms (it does some caching there) and the second ~66ms.

>>> %%time
... for attr in dir(f.events.tracks):
...     value = getattr(f.events.tracks, attr)
...     print(type(value), value)
...
<class 'awkward.highlevel.Array'> [[99.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[0.0369, 0.0369, 0.0369, 0.0351, 0.033, ... 0.242, -0.181, 0.409, 0.49, -0.296]]
<class 'awkward.highlevel.Array'> [[-0.487, -0.487, -0.487, -0.485, -0.491, ... 0.903, -0.638, 0.627, 0.447, -0.594]]
<class 'awkward.highlevel.Array'> [[-0.873, -0.873, -0.873, -0.874, -0.871, ... -0.749, -0.663, -0.749, -0.749]]
<class 'awkward.highlevel.Array'> [[[0.00496, 0.00342, -295, 142, 99.1, 1.8e+308, 4.24e-12, 10, ... [], [], [], []]]
<class 'awkward.highlevel.Array'> [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ... 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]]
<class 'awkward.highlevel.Array'> [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[295, 295, 295, 292, 291, 291, 289, ... 34.9, 34.9, 34.7, 33.9, 33.9, 33.8, 33.8]]
<class 'awkward.highlevel.Array'> [[446, 446, 448, 448, 448, 448, 448, 448, ... 448, 451, 453, 448, 455, 457, 447]]
<class 'awkward.highlevel.Array'> [[615, 615, 585, 583, 583, 583, 583, 583, ... 568, 567, 574, 568, 565, 568, 570]]
<class 'awkward.highlevel.Array'> [[125, 125, 70.7, 67.2, 68.4, 68.3, 68.3, ... 135, 135, 128, 134, 132, 127, 133]]
<class 'awkward.highlevel.Array'> [[[1, 3, 5, 4], [1, 3, 5], [1, 3], [1, 3], [1, ... 1], [1], [1], [1], [1], [1], [1]]]
<class 'awkward.highlevel.Array'> [[4000, 4000, 4000, 4000, 4000, 4000, 4000, ... 4000, 4000, 4000, 4000, 4000, 4000]]
<class 'awkward.highlevel.Array'> [[7.03e+07, 7.03e+07, 7.03e+07, 7.03e+07, ... 5.5e+07, 5.5e+07, 5.5e+07, 5.5e+07]]
CPU times: user 180 ms, sys: 12.4 ms, total: 192 ms
Wall time: 247 ms

>>> %%time
... for attr in dir(f.events.tracks):
...     value = getattr(f.events.tracks, attr)
...     print(type(value), value)
...
<class 'awkward.highlevel.Array'> [[99.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[0.0369, 0.0369, 0.0369, 0.0351, 0.033, ... 0.242, -0.181, 0.409, 0.49, -0.296]]
<class 'awkward.highlevel.Array'> [[-0.487, -0.487, -0.487, -0.485, -0.491, ... 0.903, -0.638, 0.627, 0.447, -0.594]]
<class 'awkward.highlevel.Array'> [[-0.873, -0.873, -0.873, -0.874, -0.871, ... -0.749, -0.663, -0.749, -0.749]]
<class 'awkward.highlevel.Array'> [[[0.00496, 0.00342, -295, 142, 99.1, 1.8e+308, 4.24e-12, 10, ... [], [], [], []]]
<class 'awkward.highlevel.Array'> [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ... 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56]]
<class 'awkward.highlevel.Array'> [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
<class 'awkward.highlevel.Array'> [[295, 295, 295, 292, 291, 291, 289, ... 34.9, 34.9, 34.7, 33.9, 33.9, 33.8, 33.8]]
<class 'awkward.highlevel.Array'> [[446, 446, 448, 448, 448, 448, 448, 448, ... 448, 451, 453, 448, 455, 457, 447]]
<class 'awkward.highlevel.Array'> [[615, 615, 585, 583, 583, 583, 583, 583, ... 568, 567, 574, 568, 565, 568, 570]]
<class 'awkward.highlevel.Array'> [[125, 125, 70.7, 67.2, 68.4, 68.3, 68.3, ... 135, 135, 128, 134, 132, 127, 133]]
<class 'awkward.highlevel.Array'> [[[1, 3, 5, 4], [1, 3, 5], [1, 3], [1, 3], [1, ... 1], [1], [1], [1], [1], [1], [1]]]
<class 'awkward.highlevel.Array'> [[4000, 4000, 4000, 4000, 4000, 4000, 4000, ... 4000, 4000, 4000, 4000, 4000, 4000]]
<class 'awkward.highlevel.Array'> [[7.03e+07, 7.03e+07, 7.03e+07, 7.03e+07, ... 5.5e+07, 5.5e+07, 5.5e+07, 5.5e+07]]
CPU times: user 60.3 ms, sys: 3.05 ms, total: 63.3 ms
Wall time: 66.2 ms

When invoking the Jedi autocompletion for f.events.tracks. through IPython, I get ~20sec and furthermore, you see that it's even 23sec for nested calls (e.g. for f.events.tracks.dir_x."):

>>> import IPython

>>> ipy = IPython.get_ipython()

>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 17.7 s, sys: 2.03 s, total: 19.7 s
Wall time: 21 s
['f.events.tracks.dir_x',
 'f.events.tracks.dir_y',
 'f.events.tracks.dir_z',
 'f.events.tracks.E',
 'f.events.tracks.fitinf',
 'f.events.tracks.id',
 'f.events.tracks.len',
 'f.events.tracks.lik',
 'f.events.tracks.pos_x',
 'f.events.tracks.pos_y',
 'f.events.tracks.pos_z',
 'f.events.tracks.rec_stages',
 'f.events.tracks.rec_type',
 'f.events.tracks.t']

>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 18.6 s, sys: 2.16 s, total: 20.7 s
Wall time: 22.6 s
['f.events.tracks.dir_x',
 'f.events.tracks.dir_y',
 'f.events.tracks.dir_z',
 'f.events.tracks.E',
 'f.events.tracks.fitinf',
 'f.events.tracks.id',
 'f.events.tracks.len',
 'f.events.tracks.lik',
 'f.events.tracks.pos_x',
 'f.events.tracks.pos_y',
 'f.events.tracks.pos_z',
 'f.events.tracks.rec_stages',
 'f.events.tracks.rec_type',
 'f.events.tracks.t']

>>> %time ipy.Completer.all_completions("f.events.tracks.dir_x.")
CPU times: user 21.6 s, sys: 2.32 s, total: 23.9 s
Wall time: 26.4 s
[]

If I uninstall jedi completely, I get 800ms for the first invoke and ~4ms for consecutive calls to the TAB completion:

>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 777 ms, sys: 21.2 ms, total: 798 ms
Wall time: 828 ms
['f.events.tracks.E',
 'f.events.tracks.arrays',
 'f.events.tracks.dir_x',
 'f.events.tracks.dir_y',
 'f.events.tracks.dir_z',
 'f.events.tracks.fitinf',
 'f.events.tracks.id',
 'f.events.tracks.len',
 'f.events.tracks.lik',
 'f.events.tracks.ndim',
 'f.events.tracks.pos_x',
 'f.events.tracks.pos_y',
 'f.events.tracks.pos_z',
 'f.events.tracks.rec_stages',
 'f.events.tracks.rec_type',
 'f.events.tracks.t']

>>> %time ipy.Completer.all_completions("f.events.tracks.")
CPU times: user 4.28 ms, sys: 680 µs, total: 4.96 ms
Wall time: 4.49 ms
['f.events.tracks.E',
 'f.events.tracks.arrays',
 'f.events.tracks.dir_x',
 'f.events.tracks.dir_y',
 'f.events.tracks.dir_z',
 'f.events.tracks.fitinf',
 'f.events.tracks.id',
 'f.events.tracks.len',
 'f.events.tracks.lik',
 'f.events.tracks.ndim',
 'f.events.tracks.pos_x',
 'f.events.tracks.pos_y',
 'f.events.tracks.pos_z',
 'f.events.tracks.rec_stages',
 'f.events.tracks.rec_type',
 'f.events.tracks.t']

Since the attributes in this case are all awkward.highlevel.Array, I think it might be a good idea to ping @jpivarski (the author of awkward), maybe he has similar experiences or some idea?

tamasgal avatar May 07 '21 09:05 tamasgal

Can you maybe call jedi.set_debug_function() and post the output here?

davidhalter avatar May 07 '21 12:05 davidhalter

Yes, it's quite huge... jedi.log.zip

tamasgal avatar May 07 '21 12:05 tamasgal

ak.Array (in ak.highlevel) has an explicit __dir__ method that looks up the fields of record arrays such as these.

https://github.com/scikit-hep/awkward-1.0/blob/5aaf42a60a49a3643e0466d25dcddfe7e6aa395c/src/awkward/highlevel.py#L1123-L1137

It doesn't have an explicit _ipython_key_completions_ method: https://github.com/scikit-hep/awkward-1.0/search?q=ipython_key_completions

I would have thought that IPython would simply call __dir__ if it needs tab-completions, but maybe it's not? The difference between milliseconds and 20 seconds sounds like it's loading data, which is not what you want. If adding an explicit _ipython_key_completions_ method (simply calling __dir__) fixes this, I'm (a) surprised at IPython and (b) willing to accept it as a PR.

jpivarski avatar May 07 '21 14:05 jpivarski

@jpivarski Jedi is not using _ipython_key_completions_. That method is only there for people that use use_jedi=False` (which is not the default.

The problem is probably that Jedi uses getattr to find the types of all completion results.

@tamasgal Could you maybe try running this and report back?

>>> %time [getattr(f.events.tracks, n, None) for n in dir(f.events.tracks)]

davidhalter avatar May 24 '21 17:05 davidhalter

Yes, sure! (btw. the code is runnable also for you since the test file comes packaged if you do pip install km3io km3net_testdata)

Here it is, looks fast enough for me:

>>> import km3io

>>> from km3net_testdata import data_path

>>> f = km3io.OfflineReader(data_path("offline/km3net_offline.root"))

>>> %time [getattr(f.events.tracks, n, None) for n in dir(f.events.tracks)]
CPU times: user 84.8 ms, sys: 308 µs, total: 85.1 ms
Wall time: 97.2 ms
[<Array [[99.1, 0, 0, 0, 0, ... 0, 0, 0, 0, 0]] type='10 * var * float64'>,
 <Array [[0.0369, 0.0369, ... 0.49, -0.296]] type='10 * var * float64'>,
 <Array [[-0.487, -0.487, ... 0.447, -0.594]] type='10 * var * float64'>,
 <Array [[-0.873, -0.873, ... -0.749, -0.749]] type='10 * var * float64'>,
 <Array [[[0.00496, 0.00342, -295, ... [], []]] type='10 * var * var * float64'>,
 <Array [[1, 2, 3, 4, 5, ... 53, 54, 55, 56]] type='10 * var * int32'>,
 <Array [[0, 0, 0, 0, 0, 0, ... 0, 0, 0, 0, 0]] type='10 * var * float64'>,
 <Array [[295, 295, 295, ... 33.9, 33.8, 33.8]] type='10 * var * float64'>,
 <Array [[446, 446, 448, ... 455, 457, 447]] type='10 * var * float64'>,
 <Array [[615, 615, 585, ... 565, 568, 570]] type='10 * var * float64'>,
 <Array [[125, 125, 70.7, ... 132, 127, 133]] type='10 * var * float64'>,
 <Array [[[1, 3, 5, 4], [1, ... 1], [1], [1]]] type='10 * var * var * int64'>,
 <Array [[4000, 4000, 4000, ... 4000, 4000]] type='10 * var * int32'>,
 <Array [[7.03e+07, 7.03e+07, ... 5.5e+07]] type='10 * var * float64'>]

tamasgal avatar May 25 '21 15:05 tamasgal