pylint
pylint copied to clipboard
Better user experience when starting to use pylint on legacy codebase
Current problem
I'm trying to adopt pylint (or update to a new version, or enable more checks), but I'm getting >10000 errors in my codebase!
If pylint supported baseline functionality this would be extremely easy.
Desired solution
A baseline like Qodana, basedmypy etc
Additional context / Workaround
The usage of darker provides a good solution to this case.
I don't know what Qodana and basedmypy are (and could not quite understand from a 1-minute read).
What do you mean by "baseline functionality"? Are you interested in a pylint configuration file that would be minimal and hence report less warnings, at least to let you introduce pylint progressively in your code base?
I think that it's the best way to introduce pylint on any sizeable project. However, the checkers that you would enable in this basic, limited configuration are going to depend on your style, really. I am not sure that all contributors of Pylint use the same configuration file in their projects and could settle on such a "minimal" configuration. (It also depends on other overlapping tools that may be in use, such as isort.)
I would suggest picking up a pylint configuration file from an existing project (for example this file from pylint itself) and run pylint. If the output is overwhelming (and it's likely to be!), add checkers to the "disable=" list. Rinse and repeat until you have a reasonable amount of warnings. Fix your code, and then enable back checkers you're interested in, one by one. Fix your code, enable more checkers, fix your code again, and so on, until you have enabled all checkers and disabled only the ones you're really not interested in.
@dbaty Thanks for the response!
Basically baseline saves the output of a run to a file and that file is committed as part of the project, when the tool is run again the errors are matched against the baseline and filtered out of the report.
test.py:
1
> pylint test.py
************* Module test
test.py:1:0: C0114: Missing module docstring (missing-module-docstring)
test.py:1:0: W0104: Statement seems to have no effect (pointless-statement)
----------------------------------------------------------------------
Your code has been rated at -10.00/10 (previous run: -10.00/10, +0.00)
> pylint --write-baseline test.py
************* Module test
test.py:1:0: C0114: Missing module docstring (missing-module-docstring)
test.py:1:0: W0104: Statement seems to have no effect (pointless-statement)
----------------------------------------------------------------------
The errors from this run have been saved to the baseline.
> pylint test.py
Looks good to me, no errors!
In my opinion, baseline is the best way to adopt strictness incrementally, because it basically means that the inspections will only apply to new/updated code and not spam you with countless noise.
How sensitive should this be? We could use difflib to naively compare baseline with new output. But that would present baseline messages as new in a file that has its line numbers changed.
We could ignore line numbers in the diff, which would not raise baseline errors as long as they are in the same relative order.
Is this a good general approach or is there an existing pylint feature we could leverage? LinterStats doesn't contain enough data to be a baseline right?
is there an existing pylint feature we could leverage?
There is a way to generate expected pylint output for a python file in the functional tests (ex here: the couple abstract_method.py and abstract_method.txt). We then load it in a set to compare to what we have as pylint's output. We could create a "baseline directory" with the same architecture than the project containing the file as .txt functional files then leverage the existing functional test code to raise only the difference. (Maybe update the baseline automatically if something was fixed).
Code reference:
Since this involves storing detailed results between runs, could we also store a hash of linted files and not rerun pylint on files that have not changed? Or skip all the checkers that do not look at imports.
skip all the checkers that do not look at imports.
It's a good idea but we'd need first to separate between checkers that are dependents on other files and those who do not. For example line-too-long, or while-used is independent but duplicate-code is not and must be relaunched if a single file changed even without any imports. Most [citation required] checkers would need to be relaunched if the version of a dependency changed, (something we do not control). I think the cache invalidation would be really problematic for most checks [citation required], but if we limit this to some simple checks and add a "cacheable" flag to message it could work.
There is a tool called pylint-ignore that implements this baseline approach. It creates a database of all messages on the first run. Then, on subsequent runs these messages will be ignored.
Personally, I prefer my own tool pylint-silent, which generates # pylint: disable comments for all your pylint messages. The advantage of this, is that you will notice the disabled pylint messages in your code. This gives you an opportunity to fix these issue exactly when you are anyway changing the relevant code.
https://github.com/mbarkhau/pylint-ignore seems like a very well polished solution to this problem. I opened an issue to discuss it with its maintainer : https://github.com/mbarkhau/pylint-ignore/issues/13
Clearly this is a common and annoying problem! Everybody has their own way of coping with it I guess.
My approach is the gather a list of all the warnings that are emitted and disable all of them in the config file. Then I can go through and enable (by cutting the disable) the warnings one by one.
So what I would personally like is an option to run Pylint and get back a list of all the warnings that would be emitted, rather than a list of every instance of every warning as is normally the case.
@nickdrozd how about output a generated config file that disables all of the currently existing warnings.
@KotlinIsland That would work great for my use case at least!
Building on both suggestions, "looks like to me" that
...
disable=
docstring-first-line-empty, # C0199 - 2 instances
too-many-try-statements, # A1234 - 45 instances
....
the output would look like that?
Let's not forget that pylint-ignore exists. The creator think that long term it should be a pylint feature. It seems an enormous design work was already done about this problem. Imo the solution is simply to integrate it in pylint after fixing the major problem we find in it (or at first document it or add the tool directly in pylint without real integration).
If you care about this issue, please try pylint-ignore so we can have a better idea about the limitations. (I know I also should do that, but I'm using pylint .|grep -o "([a-z\-]*)$"|cut -d "(" -f2|sort|uniq|sed s/\)/,/ which is good enough to make me lazy about reading how to use pylint-ignore).
I tried out pylint-ignore. It keeps the disabled messages in a file pylint-ignore.md along with a bunch of extra context. I don't love having yet another config file floating around and having to download yet another dependency, but other than that the tool works well.
As a side note, the pylint-ignore repo contains some really fantastic pro-Pylint propaganda!
@udifuchs I tried out pylint-silent too, and I was impressed by its speed and accuracy. It seems like it could be very useful for something, but I don't think it's the right tool for my personal approach, at least, because all that noise gets redirected right into the code itself.
I would prefer something along the lines of what @KotlinIsland and @stdedos said: a config file with all warnings disabled. That way I can go through one by one and address each warning in turn without getting overwhelmed.
It keeps the disabled messages in a file pylint-ignore.md along with a bunch of extra context. I don't love having yet another config file floating around and having to download yet another dependency, but other than that the tool works well.
We can fix this if we integrate it in pylint by putting the extra context in the pylintrc / pyproject.toml. pylint-silent could be documented for those who prefer to add disable directly in the code.
It keeps the disabled messages in a file pylint-ignore.md along with a bunch of extra context. I don't love having yet another config file floating around and having to download yet another dependency, but other than that the tool works well.
We can fix this if we integrate it in pylint by putting the extra context in the pylintrc / pyproject.toml. pylint-silent could be documented for those who prefer to add disable directly in the code.
:pray: I wouldn't want to have the pylint-ignore.md inside pyproject.toml. It's way too massive 😅
Also, for some reason (valid, I am sure), the creator decided to have full unified diff + "all other managerial info" there.
I'm all for the "one true file" approach, and this file is too massive to be merged.
@stdedos: Also, for some reason (valid, I am sure), the creator decided to have full unified diff + "all other managerial info" there.
If you look closely, it's not actually a diff, it's just a copy of the file contents, prefixed with line numbers. This is so pylint-ignore can continue to match up old lint messages with new ones where only the line number has changed.
Whatever gets ignored/silenced, either gets marked in the code (as pylint-silence does), or goes somewhere else. If it's going to go in pyproject.toml, it'll have to be a really compact format. Other than that, it's going to have to be a separate file, such as pylint-ignore.md.
Idea for a minimal toml format
[tool.pylint.ignore]
"W0511-fixme" = [
"src/myproject/filename.py:fe2431a0"
]
Here the fe2431 is a hash of the contents of the line. This might make it possible to match up old messages with new ones, even when the line number changes in the pylint message.
I wouldn't want to have the pylint-ignore.md inside pyproject.toml. It's way too massive
Good point.
I'm starting to think that adding disable in the code like pylint-silent does is the easier approach. Making the pylint-ignore conf file small, will be a lot of work (the hash is a good idea but it's not going to be easy to implement nor cheap to maintain). Same thing with what I proposed earlier by abusing the way we have expected output for functional tests
Maybe we could propose multiple automated approach using what already exist in pylint based on what is done manually right now already:
- automated disable per project (disable added to general conf)
- automated disable per-file (disable added at the top of files)
- automated disable per-line (basically pylint-silent)
This would probably satisfy the various granularity need while not forcing us to add another system on top of what already exists.
If we want to automate even more we could include a threshold, if it's more then X line disable for a file then we disable at the file level directly. If it's more than Y file disable for the project, then we disable the message at the project level.
any update on this?
pylint-ignore's maintainer has pinned pylint<2.13 due to breaking changes
and said he won't be fixing them in https://github.com/mbarkhau/pylint-ignore/issues/13#issuecomment-1175516151
I would really like to use this solution without having to downgrade my pylint
Is there anything I can do to help (either in this repo or in pylint-ignore)?
Have a look at darker
Our existing advice from the doc is:
During adoption, especially in a legacy project where pylint was never enforced, it's best to start with the --errors-only flag, then disable convention and refactor messages with --disable=C,R and progressively re-evaluate and re-enable messages as your priorities evolve.
I think pylint-ignore is "too clever" and thus would be hard to maintain. The big conf file would also be a barrier for adoption.
The last message is an actionable design proposal (add disable for existing message but with a threshold so it's disabled at the right level: locally, file, or project), so it could be a start. Do you have an opinion @DanielNoord ?
If you want to contribute that would be greatly appreciated @nimrodV81 .
Feels like we might indeed have this as an option, but not sure how difficult it would be to implement.
I don't think including this in pylint is the right way to do it at this point, there's just too much to know about the internal. One way to do it would be to create an independent one-time use tool that would get the json report from pylint . --revursive=y --enable=all --enable-all-extentions --output-format=json as input. It would load this result, populate the pyproject.toml disable for messages that were raised in more than x files, filter the message that were disabled like this from the json, then populate the beginning of the file where a message was raised more than x time, filter the message that were disabled like this from the json, then disable per line for all the messages that were raised less than x time per file. The user can then apply black and check that pylint is still not raising manually. Or disable-next could be used strategically so line are of reasonable length. Let me know what you think @nimrodV81
thank you @Pierre-Sassoulas for your fast replies
I'm not keen on solving this by adding countless pylint: disbale comments in our repo
I'll reach out to @mbarkhau to understand the limit to upgrade beyond pylint 2.12 and try to contribute to https://github.com/mbarkhau/pylint-ignore directly
What is your threshold for 'countless' ? I think a balance can be found between messages not raised when they should because of disable that are too broad and having too much disable littering the code. I'm trying to find what would be acceptable :)
@Pierre-Sassoulas my current report has ~2000 issues after I've configures pylintrc for our needs, so very few false positives the codebase size is ~55K lines of code across ~350 files
I mean, what is an acceptable number of disable to add inline ? 5 per file, 20 per file, 50 for the whole repository ? Is adding a # pylint: disable=a,b,c,... at the top of every file with raised message in the repo ok ? Or maybe in 10 files, 50 files, 10% of the files ? Is adding the message at the end of the existing disable in the pylintrc like line-too-long, # baseline, need fixing when appropriate, ok ?
@Pierre-Sassoulas first, thank you for taking the time to discuss with me I need to think about what I would consider acceptable, my initial intuition was 0 in any case, I think inline disables is the way to go. disabling at the top of the file is cleaner but will result in false negatives in future code changes
You should check out darker, it's kinda exactly what you want, it runs a linter on only the changed parts of code, so you can adopt pylint 100% incrementally.