pylint icon indicating copy to clipboard operation
pylint copied to clipboard

Add the ``files`` option

Open DanielNoord opened this issue 1 year ago • 24 comments

  • [x] Write a good description on what the PR does.
  • [x] Create a news fragment with towncrier create <IssueNumber>.<type> which will be included in the changelog. <type> can be one of: new_check, removed_check, extension, false_positive, false_negative, bugfix, other, internal. If necessary you can write details or offer examples on how the new change is supposed to work.
  • [x] If you used multiple emails or multiple names when contributing, add your mails and preferred name in script/.contributors_aliases.json

Type of Changes

Type
:sparkles: New feature

Description

EDIT: Doesn't close ... https://github.com/PyCQA/pylint/issues/5701, see discussion down below.

The change itself turned out to be very minimal. We probably need some testing of this though as I might have disregarded edge-cases. A beta for 2.16 would be good.

DanielNoord avatar Sep 19 '22 20:09 DanielNoord

Pull Request Test Coverage Report for Build 3787353582

  • 6 of 6 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.0002%) to 95.447%

Totals Coverage Status
Change from base Build 3787323093: 0.0002%
Covered Lines: 17673
Relevant Lines: 18516

💛 - Coveralls

coveralls avatar Sep 19 '22 20:09 coveralls

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit abab1ccd6063f969d8ab88c0bf58fe9ebd337439

github-actions[bot] avatar Sep 19 '22 21:09 github-actions[bot]

Hm, I don't think this adds too much complexity? Other tools work similarly. For example, mypy and pytest.

For normal users nothing changes as they can keep using the CLI like they used to. We're just exposing the internal aggregation of all left over arguments into a files option to the users to set directly.

DanielNoord avatar Sep 20 '22 06:09 DanielNoord

We saw when migrating the configuration to pyproject.toml that we have a lot more verbosity than other tool. Being configurable is a strength. But sometime there's a clear choice that we need to make to make the tool better especially when we choose what the default should be. The result of the vote on #5701 is currently more than 5 against one in favor of having a default value that permit to do pylint without adding a . (and probably also without specifying a new option in the pylintrc which is harder than adding a dot).

Pierre-Sassoulas avatar Sep 22 '22 20:09 Pierre-Sassoulas

I understand that the vote is in favour of the other option, but in this case I think as maintainers we should make a different choice than our users:

  1. This is the only way to do this in a backwards compatible way
  2. Other tools in de Python ecosystem behave similarly
  3. It offers additional benefits over making pylint > pylint .. You can easily exclude documentation folders, your setup.py and other configuration files. If we automatically lint the source directory you would need to add additional ignores to counteract that.
  4. Edit and added later: See https://github.com/PyCQA/pylint/issues/5701#issuecomment-1118213017. We also don't need to worry about excluding certain directories. That's just up to the user and their files option.

The option (imo) is not new, it was only unexposed and for the cost of having to set it once users can get all the benefits of the original proposal but sooner and without having to add additional ignores.

DanielNoord avatar Sep 22 '22 21:09 DanielNoord

This is the only way to do this in a backwards compatible way

There's breaking change and "breaking change". Changing the default value for pylint only remove the help message when calling pylint. This is not going to break millions of pipelines. It's a usability issue. We're doing it as a breaking change in 3.0 because we're doing things properly but in reality it's not that much of an issue.

Other tools in de Python ecosystem behave similarly

Some other use the current directory as a sensible default, this is an argument we had in #5701 and the issue was here to settle it. Beside "some tool does it" is not an argument in itself, what's the underlying reason to do it ? I think if a sensible default is possible then the sensible default should be used.

It offers additional benefits over making pylint > pylint .. You can easily exclude documentation folders, your setup.py and other configuration files. If we automatically lint the source directory you would need to add additional ignores to counteract that.

There's a million way to do that if you're the kind of person that have a pylintrc and that actually read the doc to find this option. (pre-commit exclude, not using the recursive option, find piped in pylint, git ls specifying each files, using ignore/ignore pattern... etc). We could even add the files option to do it as long as it's not required so that pylint lint everything recursively. But the default value need to be sensible, having a wall of text covering multiple screens when you launch the software for the first time and everyone copy pasting pylint $(git ls-files '*.py') or find . -type f -name "*.py" | xargs pylint everywhere is not a good user experience at all, and users actually agree with this 5 to 1.

Edit and added later: See https://github.com/PyCQA/pylint/issues/5701#issuecomment-1118213017. We also don't need to worry about excluding certain directories. That's just up to the user and their files option.

#6471 was fixed, it's really easy to ignore venvs now.

Pierre-Sassoulas avatar Sep 23 '22 07:09 Pierre-Sassoulas

There's breaking change and "breaking change". Changing the default value for pylint only remove the help message when calling pylint. This is not going to break millions of pipelines. It's a usability issue. We're doing it as a breaking change in 3.0 because we're doing things properly but in reality it's not that much of an issue.

It will break any documentation/guide that says you can use pylint to get all options. We have always considered documentation in our backwards incompatible changes as well.

Some other use the current directory as a sensible default, this is an argument we had in #5701 and the issue was here to settle it. Beside "some tool does it" is not an argument in itself, what's the underlying reason to do it ? I think if a sensible default is possible then the sensible default should be used.

Yeah, but that issue has given almost no arguments for doing so in response to my arguments against it. The only one who actually engaged with me showed how easy it would be to use this proposal. Another comment showed how the user is linting 'backend/*.py' which won't be solved by linting . but could be solved by files=. And the other comments are about other issues or ways around current behaviour.

Nobody has responded to my argument that pylint shouldn't try to guess which files it should and should not lint and instead expose that internal decision very easily to its users. (Imo) the size of this PR shows that we don't add any new complexity: the complexity is already there but we didn't allow the user to interact with it. If anything it might also be easier to debug https://github.com/PyCQA/pylint/issues/7003 and similar issues as we would have an actual config option to debug, instead of a list of strings that is getting passed around by various functions.

But the default value need to be sensible

And I believe it is not sensible to have pylint guess which files it should and should not lint. For example, I don't think linting setup.py or doc/conf.py is sensible, but I also don't think we should exclude them by default. The tool shouldn't try and make such decisions, that's up to the user.

Some other use the current directory as a sensible default,

Which tools do? I checked in our own pre-commit and autoflake, pyupgrade, isort, black, flake8, rstcheck, mypy and pydocstringformatter all require a files option. There is no tool in our pre-commit that doesn't. Why would we break with that pattern? They probably had similar reasons to have this be their default behaviour.

#6471 was fixed, it's really easy to ignore venvs now.

Yeah, but my argument is that doing 1) pylint + having to set your ignores or 2) setting files has the same effect in terms of effort for the user. Making us default to the current directory doesn't solve anything for that use case.

DanielNoord avatar Sep 23 '22 08:09 DanielNoord

  1. pylint + having to set your ignores or 2) setting files has the same effect in terms of effort for the user. Making us default to the current directory doesn't solve anything for that use case.

I'm not against having a files option, it would be a good addition. What I'm against not setting the default value to current directory in 3.0 and reverting a vote with a clear result done over a long period of time and 40+ vote.

it will break any documentation/guide that says you can use pylint to get all options.

Using --help or -h is pretty ubiquitous in every CLI tool under the sun, this is not such a problematic breaking change.

Yeah, but that issue has given almost no arguments for doing so in response to my arguments against it.

You can't expect everyone to argue about it, they took the time to vote though. I did not want to argue about it for hours which is why I opened #5701 in the first place. I don't think what we should go against the result when what people want is that clear. (even if I wasn't strongly in favor of defaulting to the current directory myself).

If you want I can answer in the original issue for more visibility but:

  1. It's not that large a breaking change, we're talking about the help being displayed. A massive help, nothing like the small "let's get you started" help of other tool:
Usage: black [OPTIONS] SRC ...

One of 'SRC' or 'code' is required.
  1. We can make the assumption that the user want to lint the current directory without being too wrong (as per popular demand)
  2. Settings the "files" option or . --recursive y is not done once, it's done for every invocation of pylint in a project or directory that is not already configured, every time someone want to use pylint somewhere else than the root directory on project that are already configured (until we release the MR searching for configuration file in the root directory).

And I believe it is not sensible to have pylint guess which files it should and should not lint. For example, I don't think linting setup.py or doc/conf.py is sensible, but I also don't think we should exclude them by default. The tool shouldn't try and make such decisions, that's up to the user.

I don't think we should guess. If the user do not specify anything it means they want to lint everything recursively in the current directory. This is the overwhelming result of the vote on the issue when we ask about it. This is apparent in the maybe 20+ duplicate issues that complain that pylint "misses" files because there's no __init__.py in a module or whatever (recursive search was one of the most up-voted issue for years). If users want to ignore setup.py or doc/conf.py they can ignore it there's multiple option for that.

Which tools do? I checked in our own pre-commit and autoflake, pyupgrade, isort, black, flake8, rstcheck, mypy and pydocstringformatter all require a files option. There is no tool in our pre-commit that doesn't. Why would we break with that pattern? They probably had similar reasons to have this be their default behaviour.

flake8 (the tool the closest to pylint), pytest, pyright, ... I'm not going to search for all the tools. Small tool like pwd or ls take the current directory as default as I said in the original issue.

Pierre-Sassoulas avatar Sep 23 '22 18:09 Pierre-Sassoulas

You can't expect everyone to argue about it, they took the time to vote though.

I would expect at least somebody to respond.

  1. We can make the assumption that the user want to lint the current directory without being too wrong (as per popular demand)

I understand it gives some ease of use to users, but I think we can also agree that a passerby user doesn't consider all the edge cases that could be introduced with behaviour that works fine for them. That's where we as maintainers have a role to (sometimes) deviate from popular demand.

  1. Settings the "files" option or . --recursive y is not done once, it's done for every invocation of pylint in a project or directory that is not already configured, every time someone want to use pylint somewhere else than the root directory on project that are already configured (until we release the MR searching for configuration file in the root directory).

I don't understand your point here. Are you suggesting pylint shouldn't be pylint . but pylint . --recursive=y? If so, my original point still stands: why guess for users what they want and force them to put doc/conf.py and setup.py in their ignores list if we could also give them one setting that takes care of it all?

I don't think we should guess. If the user do not specify anything it means they want to lint everything recursively in the current directory. This is the overwhelming result of the vote on the issue when we ask about it. This is apparent in the maybe 20+ duplicate issues that complain that pylint "misses" files because there's no __init__.py in a module or whatever (recursive search was one of the most up-voted issue for years). If users want to ignore setup.py or doc/conf.py they can ignore it there's multiple option for that.

I made this point in a previous comment/paragraph as well: the fact that some users express their desire for something isn't a 100% justification to do something. If it was, we would integrate pylint-django, pylint-pydantic and every other plugin into our core library? I mean if I were to ask: "Would you prefer not having to install any plugins but have them run automatically?" why would anybody vote no. The point I'm making is that in this case we should consider the potential edge cases and frustration the popular demand could cause and if there is a way to satisfy the popular demand without too much hassle. I believe this option does that.

flake8 (the tool the closest to pylint), pytest, pyright, ... I'm not going to search for all the tools. Small tool like pwd or ls take the current directory as default as I said in the original issue.

As far as I can see running flake8 directly in our source directory just goes into an indefinite hang but that could be my own environment messing something up. pytest is indeed the only tool that does this consistently. Then again, all recently developed tools don't. I think it is a good thing to follow patterns established by tools that are widely used. I'm not sure it is a good metric but both black and mypy have more stars than pylint and flake8 combined. It doesn't seem weird to follow the patterns they establish.

DanielNoord avatar Sep 23 '22 21:09 DanielNoord

I don't understand your point here. Are you suggesting pylint shouldn't be pylint . but pylint . --recursive=y?

Yes, because most of the time the current directory is not a package and we would have an error otherwise.

As far as I can see running flake8 directly in our source directory just goes into an indefinite hang but that could be my own environment messing something up.

You probably have a venv with a lot of code in it, switch to a directory with less code.

I'm not sure it is a good metric but both black and mypy have more stars than pylint and flake8 combined. It doesn't seem weird to follow the patterns they establish.

black is not even comparable, I wouldn't even think of adding pylint or flake8 without the code being first auto-formatted with isort/black.

mypy being used more means it does something better than pylint and flake8. But it does not mean they do everything better in every possible aspect. What makes mypy more successful is not "having a files options", but if I had to guess proper control flow, being maintained by Guido himself, having 100 times less false positives as a result, fixing a fundamental need in a dynamic language, and being less opinionated as its warning are about error only not about warning convention, and refactor.

passerby user doesn't consider all the edge cases that could be introduced with behaviour that works fine for them That's where we as maintainers have a role to (sometimes) deviate from popular demand.

I'm a maintainer and I don't see any worrying edge cases. I also don't like having to type . --recursive y or create a conf every time I want to casually launch pylint. I don't think we should be dogmatic about "breaking changes" to the detriment of usability.

If you have a full packaging environment with setup.py doc/conf.py and a virtualenv it means you're a power user and you're going to be able to use ignore/files or a thousand other way to specify what should be linted. You're also probably using a script to launch pylint in continuous integration or whatever and never typing anything thus you're not affected by the default value. Defaulting to the current directory help when you're a beginner trying pylint on your small scripts or someone who want to lint a particular directory inside a git repository where the pylintrc is not available. pylint adoption is hurt by the fact that it ~~is a pain in the ass to use~~ needs a full configuration before being even usable and being so configurable you don't even know where to start (which is a critic that is everywhere when talking about pylint). Having sensible default and usability improvement is a way to be used more.

Users expect to have this value as default, we already disagreed about this previously so we launched a vote and now we have a result. It's also not that big of a deal, can we move on and focus on something impactful please ?

Pierre-Sassoulas avatar Sep 24 '22 12:09 Pierre-Sassoulas

Do we still want to add this option irrespective of #5701? For power users it might make sense?

DanielNoord avatar Sep 29 '22 12:09 DanielNoord

I think that was actually the only change necessary. Rest seems like it can be merged as is?

DanielNoord avatar Sep 29 '22 13:09 DanielNoord

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 304a4c3c0cd9a29da2af7203002be9cab228a1d2

github-actions[bot] avatar Sep 29 '22 14:09 github-actions[bot]

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 75fdce013dde327f78e3bdaf88e5e5b88dbca371

github-actions[bot] avatar Dec 27 '22 12:12 github-actions[bot]

I've let this go stale, I'm very sorry. I wonder how we can make a simple coherent system alongside the globbing added in #8290 / #8312. I have the feeling this is all other the place and designing is necessary (but I haven't thought about the problem properly.) What do you think ?

Pierre-Sassoulas avatar Feb 26 '23 14:02 Pierre-Sassoulas

I feel like this is worth it either way? I don't really see how this is connected although I could change this config to use a glob instead of csv. Do you want me to do that?

DanielNoord avatar Feb 26 '23 17:02 DanielNoord

I think glob is nicer than csv. But mainly I think we have a ton of option to select/unselect and I basically agree with #8312 (comment), we should refactor / simplify this part of the documentation and make sure options are consistent together.

Hmm, I just remembered that this can't use glob as this is also being used for all current usages. Basically, this PR allows us to fix longer standing issues that all derive from the fact that we don't have a real files option.

Perhaps it is indeed better to fix this in 3.0 with a more general refactor of the way to supply file names. I'll have a look if I can make a proposal for it.

DanielNoord avatar Feb 26 '23 19:02 DanielNoord

Yeah, a lot of problem are due to the unclear way to select file for pylint. Things like https://github.com/PyCQA/pylint/issues/8319 ? pylint $(git ls-files '*.py') should not be different than pylint .--recursive=y, right ? I agree that making the minimal options that handle all reasonable use cases the same way and remove all the concurrent options that accumulated over the years sounds really helpful in the long term. Maybe we can take exactly what ruff flake8 black or mypy do ? mypy is MIT and python we could even take their actual code directly 😅

Pierre-Sassoulas avatar Feb 26 '23 20:02 Pierre-Sassoulas

@Pierre-Sassoulas This recently came up again: we don't have a files option and thus don't display any error or help message about it.

Should this go in 3.0? And if so, how do we want to approach this issue? It's quite hard to figure out a good definition of "files" that is backwards compatible with how users can do it until now.

DanielNoord avatar Aug 05 '23 16:08 DanielNoord

I think having a default to the current directory will fix the issue of the wall of help text when giving no arguments at all. Regarding the files options itself we could add it, but imo we need to look at how it's done in other tools (Ruff in particular), and do something consistent if we break or add options to discover files.

Pierre-Sassoulas avatar Aug 05 '23 17:08 Pierre-Sassoulas

I want to unblock this but I don't know what's blocking it anymore ?

Pierre-Sassoulas avatar Dec 06 '23 20:12 Pierre-Sassoulas

I'm not sure what you meant with the last comment. Is this approach correct or are you looking to do something different?

DanielNoord avatar Dec 06 '23 22:12 DanielNoord

Yes, we definitely need to have an explicit destination for the currently nebulous/positional file args (right now it's hard to even know what is parsed and where to add a default value). I think linter.config.files always containing the files to process is the main benefit here. And being able to define the files in the conf is an added feature on top of it.

Okay, but the tests passed previously. Is there anything missing from this other than a rebase?

DanielNoord avatar Dec 10 '23 21:12 DanielNoord

Okay, but the tests passed previously. Is there anything missing from this other than a rebase?

I didn't see any blocker thus my searching for what's blocking 😄 Let's rebase first.

Reading the previous review (https://github.com/pylint-dev/pylint/pull/7496#pullrequestreview-1314720625), I don't think we had a reflection concerning csv vs globbing vs regex and what we have in pylint currently to make it consistent internally and also taking inspiration from how other tools like mypy/ruff are doing it afaik. What's the rational for the current design ? To be fair I think those options (regex/glob/csv) have grown in pylint each time someone asked to be able to do something -- organically. I wouldn't be able to tell you what is pylint doctrine for discovering files or filtering/affecting files with options. Should we offer all three methods in every options ? What type of file discovery should files implement ? In CLI I think it's intuitive that it's bash regexes that we need, but not so much in a configuration file. Should we make those consistent in 4.0 ? I'm making those reviews at the end of long days and really rarely have the time to do in depth / consistency design research. (I've not even re-read the whole thread to get back into the context this time).

Pierre-Sassoulas avatar Dec 10 '23 22:12 Pierre-Sassoulas