charliecloud icon indicating copy to clipboard operation
charliecloud copied to clipboard

add `pylint` to test suite

Open lucaudill opened this issue 1 year ago • 38 comments

Since I started using VSCode, I've noticed that it's easy to add unused imports to our Python code without noticing. Adding a Python linter (e.g. pylint) to our test suite could help catch those instances and other potentially bad coding practices.

One thing worth noting is that pylint really doesn't like our code, e.g.

$ pylint --indent-string="   " lib/*py | wc -l
1656

Here our code is given a rating of 5.38/10 ☹️

pylint messages have a naming convention based on the following categories (seemingly in descending order of severity): Fatal, Error, Warning, Convention, Refactor, Information. The breakdown of the messages we get is as follows:

  • Fatal: 0
  • Error: 66
  • Warning: 131
  • Convention: 1333
  • Refactor: 111
  • Information: 0

So the vast majority of complaints pylint gives us fall under the "Convention" category. Some common messages that show up for us are:

  • C0325 (style convention, probably can be ignored)
  • C0209 (f-strings, introduced in 3.6 PEP 498, see also #992)
  • C0116 (missing docstring)
  • W0201 (attribute defined outside of __init__, mostly in build.py)
  • W0511 (complains about every "FIXME" comment)

This is just a sample, there are obviously many more. I think step 1 for whoever takes this on is to go through the list of messages and figure out what we want to ignore and what we don't.

lucaudill avatar Mar 12 '24 18:03 lucaudill

Here's a text file with the full output of

$ pylint --indent-string="   " lib/*py

pylint.txt

lucaudill avatar Mar 12 '24 22:03 lucaudill

I propose we do the following.

General comments / notes

  • We should specify a version of Pylint we’re targeting. Debian Bullseye e.g. has 2.14 but upstream latest is 3.2.

  • Pylint may be able to replace 10_sanity.bats:'trailing whitespace' for Python code, but is the exclusion too messy?

  • The Pylint message docs are overall light on rationale and occasionally contain flat-out errors, e.g. C0209 consider-using-f-string claims that f-strings are faster, citing a tweet whose immediate replies present more detailed contrary analyses. (Contrast e.g. ShellCheck docs such as SC2053.)

  • We should do Python code everywhere, not just lib/, and including *.py.in.

  • We probably want a configuration, including py-version specifying 3.6 (the default is the version of Python doing the analysis).

Dispositions (what to do)

I went through the 1,641 messages in the text file above and ended up boiling it down to 75 groups with the same disposition. (Note: I would call these “warnings”, but Pylint calls them “messages”, with “warning” being a specific category of message.) These dispositions fell into the six classes below.

I’d like the initial PR for this to yield no messages. We’ll do that by fixing the easy stuff in that PR and deferring the rest.

Now

Each of these is described in an individual comment below. These should all be fixed in this issue’s PR.

  1. Ignore globally forever. I just disagree with Pylint, so let’s turn off the warning for the entire project. This configuration happens in the same PR that implements Pylint.

  2. Fix now. Real problems that should be fixed and are important/easy enough to do in the same PR that implements Pylint.

  3. Case-by-case ignore or fix now. Each of the individual messages needs to be examined and either ignored (with a Pylint exclusion comment) or fixed in the original Pylint PR.

Later

Each of these has its own issue, linked below.

  1. Enforce the opposite. I actively want the opposite of Pylint, and because Pylint is extensible, it can enforce that. I suspect that for many of these, we can use the existing inverted checker for reference.

  2. Fix later. Real problems where we should do something, but it’s hard enough that a new PR is needed.

  3. Unclear. I don’t know what to do.

I wonder if a good approach is to disable all the messages we get (I have a spreadsheet that can give a list), e.g. with --disable X001,X002,... or configuration, then one-by-one re-enable the messages below in this issue. After that the disabled list should be exactly what we decided to defer.

reidpr avatar Mar 20 '24 17:03 reidpr

Class names (C0103 invalid-name)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L202

We just use a different naming convention. Pylint doesn’t have a pre-configured style name but it should be easy to configure via regex.

reidpr avatar Mar 20 '24 20:03 reidpr

Log function names (C0103 invalid-name)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L467-L469

These functions just have special names. Probably just add ignore comments to each def.

reidpr avatar Mar 20 '24 20:03 reidpr

Bad None comparison (C0121 singleton-comparison)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L785

Should be is None.

reidpr avatar Mar 20 '24 20:03 reidpr

Bad type check? (C0123 unidiomatic-typecheck)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/registry.py#L88-L89

I think this is OK because this class has a weird notion of equality, but needs a little more though.

reidpr avatar Mar 20 '24 20:03 reidpr

Bad range() (C0200 consider-using-enumerate)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build.py#L799-L801

These can be easily refactored to not use indexes.

reidpr avatar Mar 20 '24 20:03 reidpr

Non-idiomatic class argument (C0202 bad-classmethod-argument)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L287-L288

We use class_ but it’s easy to just change to the idiomatic cls.

reidpr avatar Mar 20 '24 20:03 reidpr

Bad __slots__ (C0205 single-string-used-for-slots)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L611

These should all be fixed.

reidpr avatar Mar 20 '24 20:03 reidpr

Unnecessary splitting (C0207 use-maxsplit-arg)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build.py#L411

This appears to be a performance thing and we don’t use this in places where it matters. I think what we have is more readable, so let’s global ignore.

reidpr avatar Mar 20 '24 20:03 reidpr

Modules too long (C0302 too-many-lines)

This probably right, but I don’t see this changing in the foreseeable future and I don’t think Pylint is the right place to enforce it. If the warnings are all solved, then the most likely way it would re-appear is that a small change pushes a module over the threshold, which is the wrong time to demand a refactor.

Global ignore.

reidpr avatar Mar 20 '24 20:03 reidpr

Multiple statements per line (C0321 multiple-statements)

  • [ ] done

I don’t mind this and we use it a lot:

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build.py#L50

but:

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/image.py#L633-L635

or:

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/image.py#L839-L842

seem confusion-prone. Let’s just stop using multiple statements per line.

reidpr avatar Mar 20 '24 20:03 reidpr

Bad import position (C0413 wrong-import-position)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L26-L33

We have a good reason for this, but probably better to disable on a case-by-case basis.

reidpr avatar Mar 20 '24 23:03 reidpr

Inappropriate dunder call (C2801 unnecessary-dunder-call)

  • [ ] done

https://github.com/hpc/charliecloud/blob/5ceefa7040c0accb91acf8b2c99e0f7ec6a97429/lib/filesystem.py#L393

False positive; we’re implementing os.PathLike. Suppress case-by-case.

reidpr avatar Mar 20 '24 23:03 reidpr

Bad except order (E701 bad-except-order)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/image.py#L751-L759

This is a real bug that should be fixed.

reidpr avatar Mar 22 '24 16:03 reidpr

Nonexistent instance member access in external classes (E1101 no-member)

  • [ ] done

This happens twice:

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L384-L385

Genuinely wrong and will crash the error message.

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L737

This one seems like if it’s not present it shouldn't work at all? Investigate.

reidpr avatar Mar 22 '24 16:03 reidpr

Non-iterable iterated (E1133 not-an-iterable)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/registry.py#L118

The specific error is a false positive, but it’s a property that need @abc.abstractmethod I think.

reidpr avatar Mar 22 '24 16:03 reidpr

Bad % (E1305 too-many-format-args)

  • [ ] done

https://github.com/hpc/charliecloud/blob/5ceefa7040c0accb91acf8b2c99e0f7ec6a97429/lib/filesystem.py#L763

Yep, that's wrong.

reidpr avatar Mar 22 '24 16:03 reidpr

Merge isinstance() (R1701 consider-merging-isintance)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build.py#L77-L79

Should probably fix.

reidpr avatar Mar 22 '24 16:03 reidpr

FP complaints about return on ch.FATAL (R1710 inconsistent-return-statements)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L648-L653

This is a FP because ch.FATAL doesn’t return (it goes into an exception procedure that kills the program). Can we teach Pylint that it doesn’t return?

reidpr avatar Mar 22 '24 17:03 reidpr

Useless return complaints (R1711 useless-return)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L1387-L1389

The return is indeed useless but in this particular case I like it there because related classes’ commit() does return something meaningful, which also might be None. Perhaps looks at these and ignore on a case-by-case basis?

reidpr avatar Mar 22 '24 17:03 reidpr

Built-in exit() re-defined (R1722 consider-using-sys-exit, W0622 redefined-builtin)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L638-L639

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L661-L664

This is the same problem. Rename our exit(), and see if we are calling sys.exit() anywhere we should be using our own exit().

reidpr avatar Mar 22 '24 17:03 reidpr

Use error-checking ch.open_() (R1732 consider-using-with)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L1108

with isn’t a good match for Charliecloud because the key operations are wrapped so they can’t fail. That is, I don’t think we should take the advice, but the flagged code should probably be using fs.Path.open() instead and the warning should be left enabled.

reidpr avatar Mar 22 '24 20:03 reidpr

Bad keyword argument defaults (W0102 dangerous-default-value)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L528-L529

I agree. Fix.

reidpr avatar Mar 22 '24 21:03 reidpr

Needless pass (W0107 unnecessary pass)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build.py#L449-L452

IMO clearer this way. Ignore globally.

reidpr avatar Mar 22 '24 21:03 reidpr

FP on self.__class__ (W0212 protected-access)

  • [ ] done

https://github.com/hpc/charliecloud/blob/5ceefa7040c0accb91acf8b2c99e0f7ec6a97429/lib/filesystem.py#L814

Technically this a false positive — we’re accessing a private class (not instance) member — but going though self.__class__ seems weird. Why not just self?

reidpr avatar Mar 22 '24 21:03 reidpr

Superclass __init__() not called (W0231 super-init-not-called)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L1374-L1377

This is on purpose b/c the inheritance is a bit strange. I’d like to leave it enabled and justify exceptions on a case-by-base basis.

reidpr avatar Mar 22 '24 21:03 reidpr

Superclass argument renamed (W0237 arguments-renamed)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/charliecloud.py#L286

Personally I think value is a bad name but this is probably right, less confusing if we use the same name.

reidpr avatar Mar 22 '24 21:03 reidpr

Method that only delegates back to parent (W0246 useless-parent-delegation)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/image.py#L928-L929

Indeed this method seems to be doing nothing. Solution is probably delete it regardless, but I really wonder why.

reidpr avatar Mar 22 '24 21:03 reidpr

Bad indentation (W0311 bad-indentation)

  • [ ] done

https://github.com/hpc/charliecloud/blob/d67937332094a7d8de8e9bd2e112d30d117c50c0/lib/build_cache.py#L827-L829

This whole method is indented too much, probably due to being moved at some point.

reidpr avatar Mar 22 '24 21:03 reidpr