pylint
pylint copied to clipboard
Initial implementation of pylintd
Type of Changes
| Type | |
|---|---|
| :bug: Bug fix | |
| ✓ | :sparkles: New feature |
| :hammer: Refactoring | |
| :scroll: Docs |
Description
This PR introduces basic implementation of pylintd - pylint deamon running in the background. It is similar implementation as mypyd [1] or blackd [2]. The implementation is highly inspired by blackd approach. pylintd is implemented as simple http server providing simple API for passing files for linting and returning back the result.
The speedup of mainly gained in following areas:
- avoiding python initialization overhead. But this is only gained when http client is not written in python
- avoiding
pylintinitialization overhead. - reusing already cached AST trees of already parsed modules in https://github.com/PyCQA/astroid/blob/47e860f7d5c73d53de77e7c450535e709e5ef99e/astroid/manager.py#L78. This is most prominent speedup when repeating linting of file does not trigger parsing other modules since they remain in cache. Re-linting of the same file is usual use case for editors what is the target area of
pylintd.
Examples:
Following examples shows comparison between running pylint and pylintd against 2 large files in pylint. The test was executed as follows:
- fresh instance of
pylintdwas executed pylintcommand was executed against the filepylintdwas asked to lint the filepylintdwas asked again to lint the file without restartingpylintd(this shows speedup when astroid manager cache is filled)
$ cat test.sh
curl -v -H "X-File-Name:$1" -H accept:application/X-pylint-parseable localhost:8000
$ wc -l pylint/checkers/classes.py
2415 pylint/checkers/classes.py
$ time pylint pylint/checkers/classes.py
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
real 0m5.146s
user 0m4.857s
sys 0m0.210s
$ time bash test.sh pylint/checkers/classes.py # 1st run
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
real 0m3.920s
user 0m0.008s
sys 0m0.013s
$ time bash test.sh pylint/checkers/classes.py # 2nd run
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
real 0m1.053s
user 0m0.008s
sys 0m0.011s
wc -l pylint/checkers/typecheck.py
2026 pylint/checkers/typecheck.py
$ time pylint pylint/checkers/typecheck.py
------------------------------------
Your code has been rated at 10.00/10
real 0m5.243s
user 0m4.837s
sys 0m0.222s
$ time bash test.sh pylint/checkers/typecheck.py # 1st run
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
real 0m4.171s
user 0m0.008s
sys 0m0.013s
$ time bash test.sh pylint/checkers/typecheck.py # 2nd run
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
real 0m1.000s
user 0m0.009s
sys 0m0.013s
[1] https://mypy.readthedocs.io/en/stable/mypy_daemon.html [2] https://black.readthedocs.io/en/stable/usage_and_configuration/black_as_a_server.html
This PR can partially help closing #4814
Of course this PR is not finished. It is more as Proof of concept.
Pull Request Test Coverage Report for Build 1505341505
- 13 of 13 (100.0%) changed or added relevant lines in 1 file are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.005%) to 93.51%
| Totals | |
|---|---|
| Change from base Build 1505246806: | 0.005% |
| Covered Lines: | 14006 |
| Relevant Lines: | 14978 |
💛 - Coveralls
https://github.com/PyCQA/astroid/issues/792 might be a problem if we want to be able to have an efficient pylintd.
https://github.com/PyCQA/astroid/issues/792 might be a problem if we want to be able to have an efficient
pylintd.
I agree but using separate service can enable users for some workarounds like restarting pylintd service on regular basis. (I know not ideal...)
@Pierre-Sassoulas our experience with running pylint as a daemon suggests that leaks definitely will be a problem. Given that any leaking node retains the full AST, so memory consumption get uncomfortably high very quickly when linting large files.
I have a question. How we can proceed with this PR (except unittests and fixing the failed CI)? I am willing to continue with this PR but I would like to understand what is missing in order to finalise this PR.
I would like to keep it as simple as possible right now maybe it can be added as experimental feature. Afterwards I can create additional issues/PRs with further improvements of pylintd.
I mentioned it in the issue already. I think we should not move forward with it, at this time. Maybe at some point we can pick it up again. https://github.com/PyCQA/pylint/issues/4814#issuecomment-980727214
To be fair I don't know how that would be used or how it would affect pylint's user. I'm not using blackd myself so I don't really know what to expect and how to use a pylint deamon. If needs some automated tests for sure, maybe discussion with potential users ?
If the startup time isn't too slow, most users (read IDEs, ...) will simply choose the easiest path which is likely launching it direct. They have that implemented already for other tools, no need to change it.
Tbh aside from a few outliers I haven't seen a use case for it yet.
It's probably aimed toward those with a big codebase that launch analysis on single file often ? Might be useful for pre-commit too as pre-commit launch multiple pylint in parallel so if we use pylintd we skip the startup time ? Does the pre-commit use case makes sense to you @matusvalo ?
For pre-commit it would be much better to use pylints own multiprocessing.
I agree just partially . Calling pylintd using curl with empty cache gives ~24% speedup - 0m5.146s vs 0m3.920s which I consider significant - see my test above. Regarding usage of pylintd, editors can spawn pylintd and use it's HTTP API instead of calling pylint command.
On the other hand, I think we need also input from developers of editors/plugins whether it make sense or they would use such API.
Regarding usage of pylintd, editors can spawn pylintd and use it's HTTP API instead of calling pylint command. I think we need also input from developers of editors/plugins whether it make sense or they would use such API.
Seems like the potential user are plugin editor then. A 25% perf gain is huge imo. But we'll have to handle caching properly in astroid or it won't last long though. I think we could move this MR to 3.0 alpha version and communicate with pylint-pycharm and vscode about it in order to see what they need and how it should be implemented. Do you want to lead the communication and specification effort @matusvalo ?
Do you want to lead the communication and specification effort @matusvalo ?
Yes, I can take this activity on my shoulders. But unfortunately, I was playing more with the implementation and I found out that the leak is pretty serious (~1 GB allocated memory after 10 runs of linting the same file). Moreover it seems that even the performance decreases over time :-/. I suppose, we need to progress with PyCQA/astroid#792 in order to have functional pylintd.
Btw. just side not around the usability of the pylintd - I just came across that blackd has already some plugins [1]. This is kind of usage of pylintd I imagine.
[1] https://plugins.jetbrains.com/plugin/14321-blackconnect
Do you want to lead the communication and specification effort @matusvalo ?
Yes, I can take this activity on my shoulders. But unfortunately, I was playing more with the implementation and I found out that the leak is pretty serious (~1 GB allocated memory after 10 runs of linting the same file). Moreover it seems that even the performance decreases over time :-/. I suppose, we need to progress with PyCQA/astroid#792 in order to have functional pylintd.
Not sure it's a good idea to start the conversation now. It's not yet clear if or when we can fix astroid. It also isn't a priority at the moment, IMO rightly so. There are other areas that need attention first.
If you never the less want to start the conversation, please make it clear that this is only experimental (at the moment) and still a long way of (maybe even years).
Not sure it's a good idea to start the conversation now.
I agree. I have mentioned it in previous message. At least memory leak should be solved first.
Hey @matusvalo just mentioning it might be worth it to get a new benchmark on memory usage after https://github.com/PyCQA/astroid/pull/1521. If you use MANAGER.clear_cache() between runs, is the memory usage now more reasonable?
@jacobtylerwalls unfortunately even after the fix (testing on main), there is a leak - consider following example:
from pylint.lint import Run
from pylint.lint import pylinter
while True:
Run(["pylint/checkers/variables.py"], exit=False)
pylinter.MANAGER.clear_cache()
Following script after ~1 minute allocates around 1 GB of memory and keep raising:

Moreover, it removes the performance benefit of avoiding rescanning cached files again and again - hence it yields no or zero perf. improvement.
What if you explicitly call garbage collection?
Unfortunately running gc does not help:
from pylint.lint import Run
from pylint.lint import pylinter
import gc
while True:
Run(["pylint/checkers/variables.py"], exit=False)
gc.collect()
pylinter.MANAGER.clear_cache()

The memory leaks in pylint-dev/astroid#792 have been solved (except for some very small ones, perhaps), but because it relies on clearing the cache to do so, that kind of violates the spirit of this PR, which is to keep the cache.
We probably want something like caching the astroid trees for third-party modules to disk, or having another separate cache for them. We could do that by checking if the third-party module defines a version constant.
The memory leaks in pylint-dev/astroid#792 have been solved (except for some very small ones, perhaps), but because it relies on clearing the cache to do so, that kind of violates the spirit of this PR, which is to keep the cache.
We probably want something like caching the astroid trees for third-party modules to disk, or having another separate cache for them. We could do that by checking if the third-party module defines a version constant.
Do we know how mypy does this? Because I feel they also know when you have modified third party code yourself, which I think would be a requirement for a good catch miss/hit.