astroid
astroid copied to clipboard
Astroid calls to ancestors are uncached and slow for templates and generics in ``ClassDef.ancestors``
tl;dr I have been trying to figure out why pylint
is slow for us and profiling points to astroid
. I'd appreciate some pointers on how I could get a clearer signal on what is going on.
Versions
astroid 2.6.5
pylint 2.9.6
python 3.7.9
Description
We run pylint
against individual packages/modules in our codebase and the performance is highly variable. I've been trying to figure out why some runs are slow with yappi
, similar to how it was done in #610.
One such module of ours has about 1000 lines of code and has become infeasibly slow on the versions listed above (I gave up after waiting 1 hour with the profiler on). I managed to get it to complete a profile in ~30 minutes by limiting the run to a single 200 line file. It does complete in under 5 minutes on our current versions: pylint
2.7.4 with astroid
2.5.2 -- I haven't captured a profile on these versions yet, but I could if that would be helpful.
I'm not able to share the code since it is proprietary, and the profile may contain sensitive information. ~I may attempt to scrub the profile and share the dot graph if that would be of interest.~ Dot graph shared below.
Profling result
The hottest node from the callgrind
was this by quite some margin:
Name.infer astroid/node_classes.py:325 57.62% (16.12%) 82185683×
I believe that refers to this code: https://github.com/PyCQA/astroid/blob/v2.6.5/astroid/node_classes.py#L325-L379
This node has quite a lot of edges making it hard to see exactly what is going on but I hope this an interesting enough starting point for us to figure out what the cause of the bottleneck could be?
There's quite a few other details I'm not sure whether to include:
- to get a useful profile, this was done with
--jobs=1
: it does run faster with--jobs=0
but it's still quite slow. I more or less copied this https://github.com/nickdrozd/pylint/commit/dad69557b85b2b2d062cc5c6395c203cd08d5709 (thanks @nickdrozd!) - I believe our
rcfile
is relatively boring apart from disabling quite a lot of pylint's checkers - Other options we're using
--persistent=n --unsafe-load-any-extension=y
Thank you for opening the issue. It makes sense that the inference would be what is costing a lot on astroid. We'd need to dig deeper and analyse what inferences exactly are taking more time than they should and probably in what pylint checker.
Coule you try to single out deep recursion call, if possible ?
I tidied up the graph, hope this will help:
Looking at your graph it looks like ClassDef.ancestors is taking quite a lot of time. Is there a lot of inheritance in your code ? Maybe this is something what we needs to cache.
Looking at your graph it looks like ClassDef.ancestors is taking quite a lot of time. Is there a lot of inheritance in your code ? Maybe this is something what we needs to cache.
Yes, and the inherited classes are also using Templates / Generics which I think is another known slow thing?
I changed the name of the issue even if this might not be the only problem with the slowness of astroid on your code. This is the one issue that jump to the eye and we already had trouble with this before. Thank you for providing a graphs this is really helpful.
That red box for Name.infer
is an opportunity for improvement.
I was also going to ask: is there a way I could figure out the minimum set of pylint rules to disable in this context to keep my runs fast in the meantime?
There is no easy way right now. We can't profile each checker independently (and they are often intertwined) There is an issue opened in pylint to provide the tools for that: https://github.com/PyCQA/pylint/issues/4067.
@Pierre-Sassoulas I started a rough sketch of caching ClassDef.ancestors
in #1120 which still fails two tests, but is this roughly the right idea?
Yeah, I think it's smoothly the right idea 😄