astroid icon indicating copy to clipboard operation
astroid copied to clipboard

Improving inference caching (unnecessary context clones?)

Open brycepg opened this issue 6 years ago • 2 comments

From investigating the code I'm getting the impression that context cache_generator isn't working very well for caching, and may have some low hanging fruit for improving performance.

Possible improvements:

  • Allow partial caching. Currently an inference value is only cached if the inference is completely exhausted. Since there are many instances of only retrieving the first value of the inference, caching is ignored because there is no partial caching.
  • Have context.push attempt to find inference results in the cache. This could potentially reduce the amount of context.clones required to stop inference from failing due to already traveled values in the context path.
  • Centralize caching. Currently caching is tied to the context. Since pylint does many disparate inferences visiting the ast, this may have a significant speedup. One possible downside would be a larger memory-speed tradeoff.

brycepg avatar Mar 30 '18 17:03 brycepg

These all sound good to me, but I would move the third point into its own issue, as I don't see it a trivial low hanging fruit. We'd also have to monitor that with these changes we don't get a too much increase in the used memory, but with your dataset this should be relatively easy to figure out.

PCManticore avatar Apr 02 '18 08:04 PCManticore

Allow partial caching. Currently an inference value is only cached if the inference is completely exhausted. Since there are many instances of only retrieving the first value of the inference, caching is ignored because there is no partial caching.

I looked into this. I don't see a way to do it without eagerly evaluating _infer() and moving away from the paradigm of generators.

infer1 = node.infer()
next(infer1)  # yields first result, we cache it
infer2 = node.infer()
next(infer2)  # yield first result from cache
next(infer2)  # uh oh...

In that example, we would need to get the second result from an iterator that hasn't been evaluated yet. As far as I know, the best ways to do that with islice still involve evaluating the entire iterator, trading away the performance savings.

Also, I'm thinking that energy might be better spent moving away from next(node.infer()) anyway, see #136.

Have context.push attempt to find inference results in the cache. This could potentially reduce the amount of context.clones required to stop inference from failing due to already traveled values in the context path.

This is intriguing. Especially after #1009 and now #2257 I'm sure there are a lot of unnecessary context clones all over the place. Would be good to attack this with #670.

Centralize caching. Currently caching is tied to the context. Since pylint does many disparate inferences visiting the ast, this may have a significant speedup. One possible downside would be a larger memory-speed tradeoff.

A global inference cache was added in #1009. #2257 is a follow-up to improve it.


Renaming this to focus on the second point.

jacobtylerwalls avatar Jul 19 '23 14:07 jacobtylerwalls