pylint icon indicating copy to clipboard operation
pylint copied to clipboard

HELP WANTED: Help copy content from ``pylint-errors`` over to our own documentation

Open DanielNoord opened this issue 2 years ago • 75 comments

A longstanding issue of pylint has been that there is little documentation about some of the messages and why they are emitted. For some messages the short message that is displayed is not enough to make explicit what needs to be changed or what is considered wrong. With the closure of #5527 and the merge of #5934 we have now set up a system that allows us to do so! We have also received the okay from @vald-phoenix to base this additional documentation on their wonderful pylint-errors project.

However, now we need some community help to write this documentation. If you'd like to help out and help future users of pylint understand what pylint is telling them, please continue reading.

Two examples of how the new documentation is supposed to look like can be seen [here](https://pylint.pycqa.org/en/latest/messages/error/yield-inside-async-function.html) and [here](https://pylint.pycqa.org/en/latest/messages/convention/empty-docstring.html).

How to create such pages? The process is quite straightforward. Within the doc/data/messages directory you can find various subdirectories. These are the letters of the alphabet. Within these directories you will find directories for each message that pylint can emit. These directories contain the data from which these pages are build.

To help build a new page:

  1. Create a new directory for the message you are working on. For example doc/data/messages/a/abstract-class-instantiated.
  2. Look at the page for the message in the pylint-errors repo. For example here. If the message is not documented yet you will need to come up with examples yourself.
  3. Create a good.py document in which you include a short code example that shows of good code that won't trigger the message. You can base this on pylint-error.
  4. Create a bad.py document in which you include a short bad code example. On the line that is supposed to emit the message please add a # [abstract-class-instantiated] comment. See these examples.
  5. If necessary create a pylintrc file. Here you can include configuration options for pylint like loading of optional checkers. See doc/data/messages/c/confusing-consecutive-elif/pylintrc for example.
  6. If necessary create a details.rst file. Here you can place additional details in .rst format that will be displayed on the page. Note that some of the details included on the pylint-error pages are already included in our Message or Description sections.
  7. If necessary create a related.rst file. Here you can place .rst style links to external sources that are relevant for the message. Generally we do not include the links to Test Cases and the Issue Tracker that pylint-error provides as they can break easily.
  8. If you're basing yourself on the work in the pylint-errors repo it is appreciated if you add the following line to your commits: Co-authored-by: Vladyslav Krylasov <[email protected]> (or any of the other contributors to that project you're basing yourself on!
  9. To make sure the code examples are correct, you can run pytest doc/test_messages_documentation.py.
  10. Create a PR. A changelog entry is not necessary :)

Two examples of how to do this can be found in #5934.

To make this work somewhat manageable please do not add more than 5 messages in one PR, as otherwise it will be impossible to review the changes correctly.

The big list of messages without any documentation: As of the 2023-09-09.

  • [x] i/invalid-character-carriage-return #9042
  • [x] i/invalid-character-nul #9042
  • [ ] t/typevar-name-incorrect-variance (need a more detailed explanation by a core maintainer)
  • [ ] c/c-extension-no-member (This one is going to be hard to trigger, not for beginner)

Related pull requests: #5992, #5994, #5995, #6016, #5993, #5996, #6150, #5997, #6026, #6086, #6000, #6088, #6021, #6002, #6090, #6164, #6166, #6020, #6104, #6103, #6167, #6105, #6106, #6107, #6156, #6157, #6131, #6009, #6139, #6083, #6092, #6200, #6203, #6204, #6237, #6159, #6243, #6250, #6115, #6248, #6249, #6339, #6341, #6342, #6343, #6340, #6344, #6345, #6262, #6263, #6264, #6265, #6080, #6079, #6206, #6245, #6205, #6158, #6114, #6114, #6201, #6160, #6149, #6238, #6152, #6153, #6133, #6244, #6230, #6232, #6151, #6247, #6132, #6239, #6240, #6472, #6699, #6679, #6460, #6697, #6630, #6637, #6698, #6472, #6586, #6541, #6472, #6669, #6618, #6690, #6619, #6620, #6621, #6659, #6576, #6574, #6540, #6700, #6590, #6701, #6591, #6583, #6562, #6585 #9042

https://github.com/pylint-dev/pylint/issues/7897

(List of related pull request is not exhaustive)

DanielNoord avatar Mar 23 '22 19:03 DanielNoord

I agree that having community support for this would be awesome, but I think we're not ready for that yet. In my opinion we should do a script to do the grunt work of migrating from pylint-error. Then proper reviews of the current examples when we integrate them and creation of missing examples would be super helpful.

Here's what I think we need to do following #5527, we really need community help only at a few steps:

  • [ ] Merging git history of pylnt-error (or simply copy pasting) and moving all the relevant content in doc/data/messages/original (maintainer job)
  • [ ] Create a script to migrate pylint-errors format to our structure (maintainer job)
  • [ ] Add and check messages by manageable batch until there is nothing left of the "original" pylint-errors files
  • [ ] Add functional tests for good.py/bad.py if they exists (maintainer job)
  • [ ] Add content when pylint-error did not have any

Pierre-Sassoulas avatar Mar 23 '22 19:03 Pierre-Sassoulas

Here's what I think we need to do following #5527, we really need community help only at a few steps:

  • [ ] Merging git history of pylnt-error (or simply copy pasting) and moving all the relevant content in doc/data/messages/original (maintainer job)
  • [ ] Create a script to migrate pylint-errors format to our structure (maintainer job)
  • [ ] Add and check messages by manageable batch until there is nothing left of the "original" pylint-errors files
  • [ ] Add functional tests for good.py/bad.py if they exists (maintainer job)
  • [ ] Add content when pylint-error did not have any

Sorry, perhaps I should have discussed with you prior.

However, I don't think this is feasible. During the transfer of the two test issues I already noticed a mistake in the examples for pylint-errors (a bad code example that was actually good code), so just migrating all data doesn't really work. Besides, the work necessary to migrate their data structure to ours isn't really worth the effort (imo). For example, the data that is in their Rationale sections is a combination of the message that gets emitted by pylint, the message description that is in the pylint code and some of their own comments. Often the data that originally came from pylint will be outdated so we also can't automate the separation of these types of data into details.rst, related.rst and not useful.

Basically, if we want to migrate the data we would need to put in so much effort that it is probably much better to do it by hand and just do one review for each message instead of doing multiple across many different steps. Having no code example is better than copying incorrect code examples I think, so we would also need to check the data that gets migrated in any automated step...

That's why I added the line about adding co-authorship: I do want to credit them for any inspiration we use in our documentation.

DanielNoord avatar Mar 23 '22 19:03 DanielNoord

I already noticed a mistake in the examples for pylint-errors

Yes I noticed that, this is why I proposed to review chunk by chunk. If the migration script is complicated / not worth it, it make sense to do it manually. Did you check the work that need to be done to automate this, or is this a wet finger kind of thing ? I could try a proof of concept to see if it works otherwise (once 2.13 is released of course).

Pierre-Sassoulas avatar Mar 23 '22 21:03 Pierre-Sassoulas

I already noticed a mistake in the examples for pylint-errors

Yes I noticed that, this is why I proposed to review chunk by chunk. If the migration script is complicated / not worth it, it make sense to do it manually. Did you check the work that need to be done to automate this, or is this a wet finger kind of thing ? I could try a proof of concept to see if it works otherwise (once 2.13 is released of course).

No this is wet finger. But if we're going to review by chunks anyway, why then not do this manually? I mean, anybody who is looking to contribute is free to use a script to make their PR, but I think it makes more sense to do one review for each message for a "final" page than do two or three (initial migration, update of test case, add missing content).

DanielNoord avatar Mar 23 '22 21:03 DanielNoord

I think it makes more sense to do one review for each message for a "final" page than do two or three

Sure, multiple messages in the same PR would be messy, let's do "chunk of one" 😄

anybody who is looking to contribute is free to use a script to make their PR

Maybe it could even be our script and save them a lot of time. I'm going to create a proof of concept to migrate one message from pylint-error to our new doc system (so contributor just have to review/fix and create a PR) and see how it goes. You did the algorithm yourself in the description, I'm going to follow that. The "co-authored by" part in particular look like it would need to be automated to be accurate because it's not that easy to check the git history.

Pierre-Sassoulas avatar Mar 24 '22 06:03 Pierre-Sassoulas

Maybe it could even be our script and save them a lot of time. I'm going to create a proof of concept to migrate one message from pylint-error to our new doc system (so contributor just have to review/fix and create a PR) and see how it goes. You did the algorithm yourself in the description, I'm going to follow that. The "co-authored by" part in particular look like it would need to be automated to be accurate because it's not that easy to check the git history.

Co-authors show in the GitHub interface, luckily 😄 One thing we should check and which is more difficult is that the directories have the correct names, you easily miss that on Github. Let me know if I can update the OP to add a script people can use.

DanielNoord avatar Mar 24 '22 07:03 DanielNoord

Thing like astroid-error are internal and there's nothing to do for the user. Maybe we should keep a list of such messages and treat them differently ?

Pierre-Sassoulas avatar Mar 27 '22 13:03 Pierre-Sassoulas

Thing like astroid-error are internal and there's nothing to do for the user. Maybe we should keep a list of such messages and treat them differently ?

I think we could still add something in details.rst for these messages. After this process has been done we can then make a check that each message should have at least one files in its data folder.

DanielNoord avatar Mar 28 '22 08:03 DanielNoord

Question for assign-to-new-keyword: This message only targets async and await and is only emitted for Python versions < 3.7. Should we support message specific configuration files as for the functional tests?

DudeNr33 avatar Apr 01 '22 07:04 DudeNr33

Ah, yeah I hoped to avoid that... I think we will drop that message in a couple of weeks so perhaps not worth to add documentation for it.

We only run the check on Python 3.8 though. Of the top of my head I don't know any messages that do/don't emit for 3.8 but do for others. But we should probably exclude those? Perhaps not with configuration files but with a simple list in the extension which can be updated when we switch the base python for our CI to another version?

Running the documentation check on all versions seems overkill.

DanielNoord avatar Apr 01 '22 07:04 DanielNoord

If we'd be willing to add support for configuration files, we could maybe make use of the py-version option. That way we can still only run the tests with one Python version. And we could also catch some bugs where the py-version is not taken into account (assign-to-new-keyword would be one, although I agree that this particular message is probably not worth working if it will be dropped soon).

DudeNr33 avatar Apr 01 '22 08:04 DudeNr33

If we'd be willing to add support for configuration files, we could maybe make use of the py-version option.

That way we can still only run the tests with one Python version.

And we could also catch some bugs where the py-version is not taken into account (assign-to-new-keyword would be one, although I agree that this particular message is probably not worth working if it will be dropped soon).

That's good idea. I like that! Do you want to work on this?

DanielNoord avatar Apr 01 '22 08:04 DanielNoord

Yes, I will take a look.

DudeNr33 avatar Apr 01 '22 08:04 DudeNr33

Hello, I make a script that converts error documentation from pylint-errors to bad.py, good.py, related.rst, and details.rst in the appropriate folder, But it's very bad because I am a beginner, for most case, it will work but if the documentation missing some element it will just print error on some file but the majority of the time it will work (not guaranteed 😵‍💫 )

import pathlib
import re
import os

location = ".\\errors\\classes"


pattern_identifier = re.compile(r"((\#{2}\s)(\w\d{4})\s\((.*?)\))")  # detect ## E0001 (error-name-spam)
code_block = re.compile(r"((```python\n)([\w\W]*?)(```))")  # detect bad and good code between ``` (triple backtick)
rationale = re.compile(
    r"(\#{3}\sRationale:\n\n)([\w\W]*?)(\n\#{3}\sRelated\sresources:\n)"
) # This just detect description so not necessary anymore
related = re.compile(
    r"((\[((\w)*?)\])(\(((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*\)))"
) # This detect Testcase and Issue Tracker link

for path in pathlib.Path(location).iterdir():
    if path.is_file():
        current_file = open(path, "r")
        test_string = current_file.read()
        current_file.close()

        result_identifier = pattern_identifier.search(test_string)
        result_code = code_block.findall(test_string)
        result_rationale = rationale.search(test_string)
        result_related = related.findall(test_string)

        folder_name = result_identifier.group(4)

        folder_initial = folder_name[0]

        if not os.path.exists(f".\\result\\{folder_initial}"):
            os.makedirs(f".\\result\\{folder_initial}")

        if not os.path.exists(f".\\result\\{folder_initial}\\{folder_name}"):
            os.makedirs(f".\\result\\{folder_initial}\\{folder_name}")

        # Write good.py code
        write_good = open(
            f".\\result\\{folder_initial}\\{folder_name}\\good.py",
            "w",
            encoding="utf-8",
        )
        if result_code == []:
            write_good.write("Not Tested")
            print(f"Error on {folder_name} - good.py")
            write_good.close()
        else:
            write_good.write(result_code[1][2])
            write_good.close()

        # Write bad.py code
        write_bad = open(
            f".\\result\\{folder_initial}\\{folder_name}\\bad.py",
            "w",
            encoding="utf-8",
        )
        if result_code == []:
            write_bad.write("Not Tested")
            print(f"Error on {folder_name} - bad.py")
            write_bad.close()
        else:
            write_bad.write(result_code[0][2])
            write_bad.close()

        # Write details.rst
        write_detail = open(
            f".\\result\\{folder_initial}\\{folder_name}\\details.rst",
            "w",
            encoding="utf-8",
        )
        if result_rationale == []:
            write_detail.write("Not Tested")
            print(f"Error on {folder_name} - details.rst")
            write_detail.close()
        else:
            write_detail.write(result_rationale.group(2))
            write_detail.close()

        # Write related.rst
        write_related = open(
            f".\\result\\{folder_initial}\\{folder_name}\\related.rst",
            "w",
            encoding="utf-8",
        )
        if result_related == []:
            write_related.write("Not Tested")
            print(f"Error on {folder_name} - related.rst")
            write_related.close()
        else:
            edit = result_related[0][4]
            new_link = edit.replace("(", "<")
            new_link2 = new_link.replace(")", ">")
            write_related.write(
                f"`{result_related[0][2]} {new_link2}`_\n"
                f"`Issue Tracker <https://github.com/PyCQA/pylint/issues?q=is%3Aissue+%22{result_identifier.group(4)}%22+OR+%22{result_identifier.group(3)}%22>`_",
            )
            write_related.close()

gunungpw avatar Apr 01 '22 09:04 gunungpw

Hello, I have problems with pre-commit auto-fix formatting style example for bad.py. How to fix this problem? I have tried a few error messages that have similar problems with this also, especially for error messages about formatting style that auto fix by black or pre-commit

  • C0301 (line-too-long)
  • C0302 (too-many-lines)
  • C0303 (trailing-whitespace)
  • C0304 (missing-final-newline)
  • C0305 (trailing-newlines)
  • C0321 (multiple-statements)
  • C0325 (superfluous-parens)
  • C0326 (bad-whitespace)
  • C0327 (mixed-line-endings)
  • C0328 (unexpected-line-ending-format)
  • C0330 (bad-continuation)
  • W0301 (unnecessary-semicolon)
  • W0311 (bad-indentation)
  • W0312 (mixed-indentation)

efected PR #6202 #6206 Many of this example (not yet try all) will autofix. Any advice for this problem , Thank you :) Hope you all have a nice day

gunungpw avatar Apr 06 '22 03:04 gunungpw

We probably need to turn of some formatters for these messages.

@Pierre-Sassoulas Do we want to disable all formatting on these files or make specific exclusion patterns?

DanielNoord avatar Apr 06 '22 06:04 DanielNoord

I think we can start with specific exclusion and see of it's still reasonable if we document everything.

Pierre-Sassoulas avatar Apr 06 '22 06:04 Pierre-Sassoulas

@DanielNoord I think bug description misses to mention possible usage of pylintrc file in message directory.

matusvalo avatar Apr 15 '22 19:04 matusvalo

This option was introduced at a later point, I updated the initial comment.

DudeNr33 avatar Apr 16 '22 08:04 DudeNr33

To give a little status update. We now have data for 102 out of 392 messages. That's a little over one quarter! Thanks to all who have already participated in this effort!

I have also updated the table in the opening post.

DanielNoord avatar Apr 21 '22 09:04 DanielNoord

I took a look at the analytics from Read the doc. Here are the message sorted by most searched to least searched:

invalid-name 103 missing-module-docstring 43 unspecified-encoding 41 line-too-long 40 logging-fstring-interpolation 40 too-few-public-methods 35 missing-function-docstring 30 import-error 28 fixme 27 too-many-arguments 19 consider-using-f-string 17 inconsistent-return-statements 16 redefined-outer-name 16 trailing-whitespace 14 wrong-import-order 13 duplicate-code 13 too-many-locals 13 consider-using-with 13 useless-suppression 13 unused-argument 12 broad-except 12 raise-missing-from 12 missing-class-docstring 10 too-many-statements 10 global-statement 10 no-member 9

(It probably is more an indication of what messages are annoying vs what messages need to be documented imo) I think it also underline the need to document deleted message as c0326 c0330 and py3k are popular search terms and do not exists in the doc.

Result gotten by adding the following script in MessageDefinitionStore.messages and launching pylint on a file:

        values = """\
c0103 (3 results) 68 searches
c0114 (4 results) 43 searches
c0115 (2 results) 10 searches
c0116 (2 results) 30 searches
c0209 (2 results) 17 searches
c0301 (2 results) 20 searches
c0303 (3 results) 14 searches
c0326 (0 results) 14 searches
c0330 (0 results) 14 searches
c0411 (2 results) 13 searches
e0401 (2 results) 28 searches
e1101 (5 results) 9 searches
fixme (12 results) 9 searches
invalid-name (107 results) 19 searches
invalid-name (109 results) 16 searches
line-too-long (88 results) 9 searches
line-too-long (90 results) 11 searches
logging-format-style (81 results) 9 searches
logging-fstring-interpolation (17 results) 10 searches
logging-fstring-interpolation (19 results) 10 searches
py3k (5 results) 16 searches
r0201 (2 results) 13 searches
r0201 (3 results) 13 searches
r0801 (2 results) 13 searches
r0903 (3 results) 13 searches
r0903 (4 results) 10 searches
r0913 (2 results) 19 searches
r0914 (2 results) 13 searches
r0915 (2 results) 10 searches
r1710 (2 results) 16 searches
r1732 (4 results) 13 searches
snake_case (3 results) 14 searches
too-few-public-methods (71 results) 12 searches
unspecified-encoding (15 results) 12 searches
useless-suppression (26 results) 13 searches
w0511 (3 results) 18 searches
w0603 (3 results) 10 searches
w0613 (3 results) 12 searches
w0621 (2 results) 16 searches
w0703 (2 results) 12 searches
w0707 (2 results) 12 searches
w1203 (2 results) 20 searches
w1514 (2 results) 29 searches\
                """

        from pylint.exceptions import UnknownMessageError

        results = {}
        for value in values.split("\n"):
            splitted_line = [v for v in value.split(" ") if v]
            try:
                msgid, result_number, result_, searches, *_ = splitted_line
                msg_definition = self.get_message_definitions(msgid)[0]
                if msg_definition.symbol in results:
                    results[msg_definition.symbol] += int(searches)
                else:
                    results[msg_definition.symbol] = int(searches)
            except ValueError as e:
                print(f"Can't treat {value} from {splitted_line} {e}")
            except UnknownMessageError as e:
                print(f"Can't find {msgid} in message store for {splitted_line} {e}.")

        for result in sorted(results, key=results.get, reverse=True):
            print(result, results[result])

Edit upgraded result 2023-02-14:

invalid-name 460 missing-module-docstring 196 too-few-public-methods 146 missing-function-docstring 127 broad-exception-caught 126 logging-fstring-interpolation 125 unspecified-encoding 121 too-many-locals 112 consider-using-f-string 109 no-member 104 import-error 101 line-too-long 92 redefined-outer-name 88 too-many-branches 86 missing-class-docstring 80 too-many-arguments 80 inconsistent-return-statements 77 duplicate-code 76 no-else-return 74 too-many-instance-attributes 68 consider-using-with 67 redefined-builtin 66 trailing-whitespace 63 too-many-statements 62 protected-access 58 raise-missing-from 58 wrong-import-order 57 unused-argument 55 dangerous-default-value 50 no-name-in-module 48 pointless-string-statement 48 global-statement 48 fixme 45 unused-variable 44 attribute-defined-outside-init 43 missing-final-newline 42 bare-except 41 consider-iterating-dictionary 37 consider-using-dict-items 36 consider-using-enumerate 35 no-value-for-parameter 35 too-many-function-args 35 unused-import 35 ungrouped-imports 34 too-many-return-statements 33 superfluous-parens 32 anomalous-backslash-in-string 31 wildcard-import 30 cell-var-from-loop 30 logging-not-lazy 29 f-string-without-interpolation 29 super-with-arguments 27 arguments-differ 27 missing-timeout 27 too-many-lines 26 import-outside-toplevel 26 too-many-nested-blocks 26 use-dict-literal 26 unsubscriptable-object 25 consider-using-in 25 chained-comparison 25 unnecessary-pass 25 unnecessary-lambda 24 syntax-error 23 logging-format-interpolation 23 too-many-public-methods 22 consider-using-generator 22 singleton-comparison 21 simplifiable-if-expression 21

Pierre-Sassoulas avatar May 23 '22 04:05 Pierre-Sassoulas

I'm working on unused-argument.


Filed #6834

harupy avatar Jun 05 '22 05:06 harupy

Working on subprocess-run-check.

harupy avatar Jun 05 '22 09:06 harupy

Working on f-string-without-interpolation.

harupy avatar Jun 05 '22 09:06 harupy

Working on format-needs-mapping.

harupy avatar Jun 05 '22 10:06 harupy

Working on missing-format-attribute.

harupy avatar Jun 05 '22 10:06 harupy

Working on overlapping-except.

harupy avatar Jun 05 '22 10:06 harupy

Working on redefined-variable-type.

harupy avatar Jun 05 '22 10:06 harupy

Working on repeated-keyword.

harupy avatar Jun 05 '22 10:06 harupy

Working on keyword-arg-before-vararg.

harupy avatar Jun 05 '22 10:06 harupy