vulture icon indicating copy to clipboard operation
vulture copied to clipboard

False positive when enum is used in iteration

Open eltoder opened this issue 2 years ago • 7 comments

Enum classes allow iteration over the class, which returns all enum members. This is often convenient particularly when writing tests. Unfortunately, vulture does not understand this and reports enum members as unused. For example:

from enum import Enum

class E(Enum):
    A = 1
    B = 2

list(E)

produces these errors:

$ python -m vulture test_enum.py
test_enum.py:4: unused variable 'A' (60% confidence)
test_enum.py:5: unused variable 'B' (60% confidence)

This can be whitelisted, but it's a bit tedious when there are many enum members.

Some ideas on how this can be improved:

  1. It may be possible to detect some cases of iteration like list(E), tuple(E), a, b, c = E, for e in E and mark all members as used.
  2. For a work-around, adding a class-level comment or annotation that all members are used can make whitelisting much easier. This can help in other cases as well, for example this issue about dataclasses: #264.
  3. Maybe if the enum class is listed in __all__, all members can be considered used?

eltoder avatar Feb 23 '23 04:02 eltoder

Thanks for the report! All proposed solutions have in common that they need to find out which variables are defined in the scope of the class. Since Vulture currently has no notion of scope, they are tricky to implement. If you see a solution, I'm all ears :-)

jendrikseipp avatar Feb 23 '23 09:02 jendrikseipp

@jendrikseipp That sounds like an even bigger problem :-) Adding scope information should improve the accuracy quite a bit. Python scoping rules are not very complicated.

eltoder avatar Feb 23 '23 15:02 eltoder

There are some libraries that enrich the abstract syntax tree (on which Vulture operates) with scoping information, but I haven't looked deeper into them. I think that could be a good starting point.

jendrikseipp avatar Feb 23 '23 15:02 jendrikseipp

Can I work on this?

anudaweerasinghe avatar Apr 20 '23 23:04 anudaweerasinghe

Sure, happy to see progress here! It might be a good idea to discuss the high level approach though, before you spend time on implementing the details. If you have a high-level plan, I'm happy to discuss it.

jendrikseipp avatar Apr 21 '23 09:04 jendrikseipp

@jendrikseipp looked into this a little more, and wanted to run the current implementation idea by you...

Progress thus far:

  • Identified that Vulture generates an AST of the provided source code in the scan function in core.py
  • Each node in the AST is then visited by the visit function
  • The visit function decides what helper to use based on the type of node
  • For this issue, we specifically care about the visit_Call and visit_ClassDef functions because these are invoked when the list() function is called, and when a class is defined with the Enum subclass.

Based on these findings, here is a high level approach of the solution:

  • Extend the ast with scoping information using the asttokens package.
  • In visit_Call, check if the node.func.id=='list' - are we calling the list function?
  • If so, get the argument(node.args[0].id) and use the scoping information provided by asttokens to check if the argument is a class
  • If so, use the scope info about the class to check if Enum is a superclass.
  • If so, then add all the fields defined in the class to defined_variables

anudaweerasinghe avatar Apr 27 '23 01:04 anudaweerasinghe

Yes, your explanation makes sense and the approach is correct, I'd say. (In the last step, you probably mean "used_variables".)

The only thing I'm unsure about is the AST library. Have you compared asttokens with https://github.com/kavigupta/ast_scope and https://github.com/pylint-dev/astroid ? We need to pick one that does the job, is well-maintained and doesn't have too many dependencies.

jendrikseipp avatar Apr 28 '23 07:04 jendrikseipp