pylint icon indicating copy to clipboard operation
pylint copied to clipboard

Initial implementation of pylintd

Open matusvalo opened this issue 4 years ago • 18 comments

Type of Changes

Type
:bug: Bug fix
:sparkles: New feature
:hammer: Refactoring
:scroll: Docs

Description

This PR introduces basic implementation of pylintd - pylint deamon running in the background. It is similar implementation as mypyd [1] or blackd [2]. The implementation is highly inspired by blackd approach. pylintd is implemented as simple http server providing simple API for passing files for linting and returning back the result.

The speedup of mainly gained in following areas:

  1. avoiding python initialization overhead. But this is only gained when http client is not written in python
  2. avoiding pylint initialization overhead.
  3. reusing already cached AST trees of already parsed modules in https://github.com/PyCQA/astroid/blob/47e860f7d5c73d53de77e7c450535e709e5ef99e/astroid/manager.py#L78. This is most prominent speedup when repeating linting of file does not trigger parsing other modules since they remain in cache. Re-linting of the same file is usual use case for editors what is the target area of pylintd.

Examples:

Following examples shows comparison between running pylint and pylintd against 2 large files in pylint. The test was executed as follows:

  1. fresh instance of pylintd was executed
  2. pylint command was executed against the file
  3. pylintd was asked to lint the file
  4. pylintd was asked again to lint the file without restarting pylintd (this shows speedup when astroid manager cache is filled)
$ cat test.sh
curl -v -H "X-File-Name:$1" -H accept:application/X-pylint-parseable localhost:8000

$ wc -l pylint/checkers/classes.py
    2415 pylint/checkers/classes.py

$ time pylint pylint/checkers/classes.py

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m5.146s
user	0m4.857s
sys	0m0.210s

$ time bash test.sh pylint/checkers/classes.py          # 1st run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m3.920s
user	0m0.008s
sys	0m0.013s

$ time bash test.sh pylint/checkers/classes.py          # 2nd run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m1.053s
user	0m0.008s
sys	0m0.011s


wc -l pylint/checkers/typecheck.py
    2026 pylint/checkers/typecheck.py

$ time pylint pylint/checkers/typecheck.py

------------------------------------
Your code has been rated at 10.00/10


real	0m5.243s
user	0m4.837s
sys	0m0.222s

$ time bash test.sh pylint/checkers/typecheck.py       # 1st run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m4.171s
user	0m0.008s
sys	0m0.013s

$ time bash test.sh pylint/checkers/typecheck.py       # 2nd run

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)


real	0m1.000s
user	0m0.009s
sys	0m0.013s

[1] https://mypy.readthedocs.io/en/stable/mypy_daemon.html [2] https://black.readthedocs.io/en/stable/usage_and_configuration/black_as_a_server.html

This PR can partially help closing #4814

matusvalo avatar Nov 25 '21 21:11 matusvalo

Of course this PR is not finished. It is more as Proof of concept.

matusvalo avatar Nov 25 '21 21:11 matusvalo

Pull Request Test Coverage Report for Build 1505341505

  • 13 of 13 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.005%) to 93.51%

Totals Coverage Status
Change from base Build 1505246806: 0.005%
Covered Lines: 14006
Relevant Lines: 14978

💛 - Coveralls

coveralls avatar Nov 25 '21 21:11 coveralls

https://github.com/PyCQA/astroid/issues/792 might be a problem if we want to be able to have an efficient pylintd.

Pierre-Sassoulas avatar Nov 26 '21 12:11 Pierre-Sassoulas

https://github.com/PyCQA/astroid/issues/792 might be a problem if we want to be able to have an efficient pylintd.

I agree but using separate service can enable users for some workarounds like restarting pylintd service on regular basis. (I know not ideal...)

matusvalo avatar Nov 26 '21 14:11 matusvalo

@Pierre-Sassoulas our experience with running pylint as a daemon suggests that leaks definitely will be a problem. Given that any leaking node retains the full AST, so memory consumption get uncomfortably high very quickly when linting large files.

superbobry avatar Nov 26 '21 14:11 superbobry

I have a question. How we can proceed with this PR (except unittests and fixing the failed CI)? I am willing to continue with this PR but I would like to understand what is missing in order to finalise this PR.

I would like to keep it as simple as possible right now maybe it can be added as experimental feature. Afterwards I can create additional issues/PRs with further improvements of pylintd.

matusvalo avatar Nov 26 '21 19:11 matusvalo

I mentioned it in the issue already. I think we should not move forward with it, at this time. Maybe at some point we can pick it up again. https://github.com/PyCQA/pylint/issues/4814#issuecomment-980727214

To be fair I don't know how that would be used or how it would affect pylint's user. I'm not using blackd myself so I don't really know what to expect and how to use a pylint deamon. If needs some automated tests for sure, maybe discussion with potential users ?

If the startup time isn't too slow, most users (read IDEs, ...) will simply choose the easiest path which is likely launching it direct. They have that implemented already for other tools, no need to change it.

Tbh aside from a few outliers I haven't seen a use case for it yet.

It's probably aimed toward those with a big codebase that launch analysis on single file often ? Might be useful for pre-commit too as pre-commit launch multiple pylint in parallel so if we use pylintd we skip the startup time ? Does the pre-commit use case makes sense to you @matusvalo ?

For pre-commit it would be much better to use pylints own multiprocessing.

cdce8p avatar Nov 27 '21 17:11 cdce8p

I agree just partially . Calling pylintd using curl with empty cache gives ~24% speedup - 0m5.146s vs 0m3.920s which I consider significant - see my test above. Regarding usage of pylintd, editors can spawn pylintd and use it's HTTP API instead of calling pylint command.

On the other hand, I think we need also input from developers of editors/plugins whether it make sense or they would use such API.

matusvalo avatar Dec 01 '21 20:12 matusvalo

Regarding usage of pylintd, editors can spawn pylintd and use it's HTTP API instead of calling pylint command. I think we need also input from developers of editors/plugins whether it make sense or they would use such API.

Seems like the potential user are plugin editor then. A 25% perf gain is huge imo. But we'll have to handle caching properly in astroid or it won't last long though. I think we could move this MR to 3.0 alpha version and communicate with pylint-pycharm and vscode about it in order to see what they need and how it should be implemented. Do you want to lead the communication and specification effort @matusvalo ?

Pierre-Sassoulas avatar Dec 03 '21 13:12 Pierre-Sassoulas

Do you want to lead the communication and specification effort @matusvalo ?

Yes, I can take this activity on my shoulders. But unfortunately, I was playing more with the implementation and I found out that the leak is pretty serious (~1 GB allocated memory after 10 runs of linting the same file). Moreover it seems that even the performance decreases over time :-/. I suppose, we need to progress with PyCQA/astroid#792 in order to have functional pylintd.

Btw. just side not around the usability of the pylintd - I just came across that blackd has already some plugins [1]. This is kind of usage of pylintd I imagine.

[1] https://plugins.jetbrains.com/plugin/14321-blackconnect

matusvalo avatar Dec 13 '21 07:12 matusvalo

Do you want to lead the communication and specification effort @matusvalo ?

Yes, I can take this activity on my shoulders. But unfortunately, I was playing more with the implementation and I found out that the leak is pretty serious (~1 GB allocated memory after 10 runs of linting the same file). Moreover it seems that even the performance decreases over time :-/. I suppose, we need to progress with PyCQA/astroid#792 in order to have functional pylintd.

Not sure it's a good idea to start the conversation now. It's not yet clear if or when we can fix astroid. It also isn't a priority at the moment, IMO rightly so. There are other areas that need attention first.

If you never the less want to start the conversation, please make it clear that this is only experimental (at the moment) and still a long way of (maybe even years).

cdce8p avatar Dec 13 '21 15:12 cdce8p

Not sure it's a good idea to start the conversation now.

I agree. I have mentioned it in previous message. At least memory leak should be solved first.

matusvalo avatar Dec 13 '21 19:12 matusvalo

Hey @matusvalo just mentioning it might be worth it to get a new benchmark on memory usage after https://github.com/PyCQA/astroid/pull/1521. If you use MANAGER.clear_cache() between runs, is the memory usage now more reasonable?

jacobtylerwalls avatar May 10 '22 13:05 jacobtylerwalls

@jacobtylerwalls unfortunately even after the fix (testing on main), there is a leak - consider following example:

from pylint.lint import Run
from pylint.lint import pylinter

while True:
    Run(["pylint/checkers/variables.py"], exit=False)
    pylinter.MANAGER.clear_cache()

Following script after ~1 minute allocates around 1 GB of memory and keep raising:

image

Moreover, it removes the performance benefit of avoiding rescanning cached files again and again - hence it yields no or zero perf. improvement.

matusvalo avatar Sep 05 '22 21:09 matusvalo

What if you explicitly call garbage collection?

jacobtylerwalls avatar Sep 05 '22 22:09 jacobtylerwalls

Unfortunately running gc does not help:

from pylint.lint import Run
from pylint.lint import pylinter
import gc

while True:
    Run(["pylint/checkers/variables.py"], exit=False)
    gc.collect()
    pylinter.MANAGER.clear_cache()

image

matusvalo avatar Sep 06 '22 05:09 matusvalo

The memory leaks in pylint-dev/astroid#792 have been solved (except for some very small ones, perhaps), but because it relies on clearing the cache to do so, that kind of violates the spirit of this PR, which is to keep the cache.

We probably want something like caching the astroid trees for third-party modules to disk, or having another separate cache for them. We could do that by checking if the third-party module defines a version constant.

jacobtylerwalls avatar Jul 04 '23 12:07 jacobtylerwalls

The memory leaks in pylint-dev/astroid#792 have been solved (except for some very small ones, perhaps), but because it relies on clearing the cache to do so, that kind of violates the spirit of this PR, which is to keep the cache.

We probably want something like caching the astroid trees for third-party modules to disk, or having another separate cache for them. We could do that by checking if the third-party module defines a version constant.

Do we know how mypy does this? Because I feel they also know when you have modified third party code yourself, which I think would be a requirement for a good catch miss/hit.

DanielNoord avatar Jul 04 '23 13:07 DanielNoord