pylint icon indicating copy to clipboard operation
pylint copied to clipboard

Make PyLint faster by providing a way to reuse ASTs and avoid startup time.

Open fabioz opened this issue 7 years ago • 6 comments

The main issue here is that using PyLint is actually slow compared to IDEs, and I believe a great deal in that is that PyLint always has to start from zero all the time and reparse all files available to analyze a module, whereas IDEs are able to have an instance open where they reuse the contents of such modules if they haven't changed on disk (which is most of the time as the user is usually working in a single module with many other modules being used).

i.e.: if a module has 400 dependencies, all those dependencies may need to be parsed by PyLint to actually do its analysis, always starting from scratch.

So, ideally there'd be a way to start a PyLint process, ask it to analyze a module (by giving it a module name and contents which may be dirty in the IDE) and keep that same process open and ask to analyze the same module again with different contents (and it could reuse the cached modules for any file which hasn't changed on disk). Having a single process on all the time would also alleviate the startup time, which has to be paid again on each new invocation.

If someone would be willing to add such a behavior to PyLint, I'd be available to integrate such improvements in PyDev ;)

Even people using it from the command line could benefit if a client in the command line could do that work (i.e.: start the server if it's not there and ask it to analyze a file and upon a new invocation, connect to the existing server to analyze another file).

Not sure how hard it'd be in the PyLint side for that to happen though, but thought asking for the feature couldn't hurt ;)

fabioz avatar Apr 13 '17 16:04 fabioz

There's a fundamental tradeoff between the quality of static analysis you get and the time it takes to get it. The algorithms used for real-time analysis in IDEs differ from the analysis that a command-line or continuous-integration tool like pylint offers, and in turn pylint has to be faster than other algorithms that can take weeks to run. There already exist libraries that do real-time static analysis for Python, PyCharm is the example that comes to mind off the top of my head. Pylint fills a different niche.

ceridwen avatar Apr 18 '17 15:04 ceridwen

Yeap, PyDev also gives you real-time code analysis, nevertheless, a good number of people choose to use standardize on PyLint and it's common to use it through an IDE or editor frontend (i.e.: https://pylint.readthedocs.io/en/latest/user_guide/ide-integration.html).

If no PyLint developer wants to spend time to tackle that, because of time constraints/effort required, I think it's Ok to close the issue, still, note that I'm not saying to drop the quality of the static analysis nor questioning the time to run the actual analysis, only optimize something which has a reasonable time penalty which could probably be reduced to close to zero (since no reparses will be needed unless a file actually changed) in a use case I believe is quite common (and it could probably make that part for the command line much faster too if there was an option for the command line to also use a client/server approach).

fabioz avatar Apr 18 '17 15:04 fabioz

Have we ever profiled what is a bottleneck in pylint launch?

If significant part of run is to build an AST, re-build it to astroid form, apply all brain hints etc. etc. etc. and then starting to walk those ASTs, then having those built trees pickled and stored in some form of persistent cache on filesystem may be actually beneficial. In my experience only small subset of modules is modified in-between Pylint runs. All of stdlib, 3rd party imports and actual codebase tested won't change but we would still re-build AST for them on each run.

We'd just need to figure out good way to track and invalidate those caches when needed. But obviously first step would be to profile.

rogalski avatar Apr 20 '17 18:04 rogalski

Hi, I also mentioned that pylint 1.7.0 is much slower than pylint 1.6.5. At least in my project analyze with pylint 1.6.5 takes about 1m17s and with pylint 1.7.0 - 4m20s against the same codebase.

lemcheyWhite avatar Apr 25 '17 09:04 lemcheyWhite

Please open a separate issue @lemcheyWhite

PCManticore avatar Apr 25 '17 09:04 PCManticore

A promising experiment with disk caching was described https://github.com/pylint-dev/pylint/issues/2912#issuecomment-981712889.

Then, the ability to "pin" the ASTs of 3rd-party modules is described earlier in the same issue.

jacobtylerwalls avatar Jan 24 '24 14:01 jacobtylerwalls