pylint icon indicating copy to clipboard operation
pylint copied to clipboard

[R0801] Similarities false positive on abstract classes

Open michaeldimchuk opened this issue 2 years ago • 4 comments

Bug description

Having two classes with 5 methods that have a pass implementation (assuming the default minimum 4 lines of similarity) results in a false positive, where methods with different names end up triggering the rule. This would be a false positive since the minimum number of matching lines seem to be getting applied to all pass statements in the entire class, not on a per function basis.

In my specific use case I have two abstract classes which both have 5+ abstract methods to be overridden by the implementations. As a result, the similarities rule is getting triggered because there are 5 consecutive method definitions with implementations consisting of pass.

Interestingly enough, replacing all usages of pass with ... fixes the error.

I played around with the code a little bit and found that in this if condition: https://github.com/PyCQA/pylint/blob/main/pylint/checkers/similar.py#L533

In my specific scenario, all lines of code between fst_file_start and fst_file_end or snd_file_start and snd_file_end all consisted of the pass keyword and nothing else, with eff_cmn_nb being 5, which triggered the rule.

The scenario is created with two files:

File one - test.py

class Processor:

    def aaa(self) -> None:
        pass

    def baa(self) -> None:
        pass

    def caa(self) -> None:
        pass

    def daa(self) -> None:
        pass

    def gaa(self) -> None:
        pass

File two - test2.py

class AnotherProcessor:

    def haa(self) -> None:
        pass

    def zaa(self) -> None:
        pass

    def xaa(self) -> None:
        pass

    def vaa(self) -> None:
        pass

    def waa(self) -> None:
        pass

Configuration

[tool.pylint."MESSAGES CONTROL"]
max-line-length = 120
disable = [
    "C0114", # Missing module docstring
    "C0115", # Missing class docstring
    "C0116", # Missing function docstring
]
load-plugins = [
    "pylint.extensions.no_self_use"
]

[tool.pylint.SIMILARITIES]
ignore-imports = "yes"

[tool.poetry]
name = "pylint-report"
version = "1.0.0"
description = ""
authors = ["A B <[email protected]>"]

[tool.poetry.dependencies]
python = "~3.9"

[tool.poetry.dev-dependencies]
pylint = "2.14.5"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Command used

pylint test.py test2.py

Pylint output

************* Module test2
test2.py:1:0: R0801: Similar lines in 2 files
==test2:[3:16]
==test:[3:16]
        pass

    def baa(self) -> None:
        pass

    def caa(self) -> None:
        pass

    def daa(self) -> None:
        pass

    def gaa(self) -> None:
        pass (duplicate-code)

------------------------------------------------------------------
Your code has been rated at 9.55/10 (previous run: 9.55/10, +0.00)

Expected behavior

The similarities (R0801) rule shouldn't be triggered when 2 abstract classes in different files each with (min-similarity-lines + 1) abstract methods with a "pass" implementation exist. The rule should only be triggered by the body of each function being a duplicate, not the aggregate of 5 functions with pass implementations matching against another abstract class.

Pylint version

pylint 2.14.5
astroid 2.11.7
Python 3.9.10 (main, Jan 15 2022, 11:48:04)
[Clang 13.0.0 (clang-1300.0.29.3)]

OS / Environment

macOS Monterey Version 12.4

Additional dependencies

astroid==2.11.7 dill==0.3.5.1 isort==5.10.1 lazy-object-proxy==1.7.1 mccabe==0.7.0 platformdirs==2.5.2 pylint==2.14.5 tomli==2.0.1 tomlkit==0.11.1 typing-extensions==4.3.0 wrapt==1.14.1

michaeldimchuk avatar Jul 20 '22 21:07 michaeldimchuk

I don't think abstract classes should be exempted from the duplicate code check. If function signatures are ignored I think the check is working as intended. (I'm going to wait for another opinion before closing).

Pierre-Sassoulas avatar Jul 21 '22 13:07 Pierre-Sassoulas

I agree that abstract classes shouldn't be exempted from the duplicate code check, but that's not really my ask here.

The issue here is that while ignore-signatures does in fact ignore function signatures, it doesn't take the structure of the code into account, which doesn't really make sense to me given the name of the configuration option.

In the example I provided above, the similarities check is ran on the result of a class having all its method signatures stripped, which is equivalent to just the class name and 5 pass statements. But that's not really duplicate code, especially since each individual function's contents are not evaluated against the min-similarity-lines. This check is now running against this code:

class Processor:
    pass
    pass
    pass
    pass
    pass

Which structurally is completely different from the original, and will fail whenever any other class has 5 functions with a pass body, even though that's not duplicate code.

At least my interpretation of what ignore-signatures is supposed to do, is that it won't take the function signatures into account of the similarities check, but will instead only look at the contents of the functions or other code. It shouldn't completely remove signatures and assume that the code being looked at is one block of code. Maybe this interpretation is incorrect, but then that's not really documented as explicitly as it could be: https://pylint.pycqa.org/en/latest/user_guide/configuration/all-options.html#ignore-signatures

michaeldimchuk avatar Jul 21 '22 15:07 michaeldimchuk

I definitely missed that 😄

Pierre-Sassoulas avatar Jul 21 '22 15:07 Pierre-Sassoulas

Interestingly, pylint test.py test2.py -j 2 passes. Starting with pylint 2.14.0, I have code that suddenly triggers R0801 pylint errors when running pylint with only a single worker. It used to work fine before.

Edit: Using pylint 2.13.0 also works with the provided example.

Mr-Pepe avatar Jul 28 '22 13:07 Mr-Pepe

Has there been any progress on this? We were using ignore-signatures because otherwise multilined method signatures hit the duplicate limit when you implement an abstract method. But now we can't because using ignore-signatures means two unrelated interfaces are being flagged as duplicates as in this issue. We'd like to avoid having to increase the min-similarity-lines across the whole codebase to more than the longest interface/longest method signature in an interface as it does usefully flag repeated code, so any updates would be very appreciated.

blaise17 avatar Apr 21 '23 10:04 blaise17

No one worked on it yet, but I'll happily review a merge request fixing this issue :) Do you want to contribute a fix @blaise17 ?

Pierre-Sassoulas avatar Apr 21 '23 13:04 Pierre-Sassoulas