pandas-vet
pandas-vet copied to clipboard
PD005 erroneously flags arithmetic methods from other packages
Describe the bug
PD005 is triggered when any module has a method named something like .add()
To Reproduce
I noticed this while using the tarfile
module. The tarfile
module has a method named .add()
Here is the snippet. Flake8 triggered PD005 on both of the tar.add()
calls.
tar = tarfile.open(tar_filename, "w")
tar.add(other_filename) # <-- triggers PD005
for filename in filenames:
tar.add(filename) # <-- triggers PD005
tar.close()
Expected behavior
I expect this to only be triggered when a method like .add()
is called on a pandas
object.
Additional context I expect that this applies to the comparison operators too (PD006). Any fix for PD005 should also be made for PD006 too.
This is also true for PD011 with .values()
, e.g. Flask request object has an attribute values
, too.
Indeed. I should create a new Issue detailing this, but the problem is that many of our checks rely on the type of the object being a pandas object. This is a fundamental issue with static linting in Python because the AST doesn't know what type a thing is.
I am open to suggestions on how to get around this, but it will likely be a big job. For now, the undesirable workaround is to turn off checks that are particularly bothersome.
A quick hack (that could be optional for example) would be to check if pandas is imported in the file. Sure, it won't work for a lot of cases, but for a lot of cases it also will work.
A quick hack (that could be optional for example) would be to check if pandas is imported in the file. Sure, it won't work for a lot of cases, but for a lot of cases it also will work.
I have considered this. If you'd like to put in a PR that adds this, I'd be happy to take a look at it.
Regarding the case for PD011 - .values
, one check could be for the parentheses. I get a lot of false positives because I'm using both pandas and dictionaries. Looping through the values of the dictionary with .values()
triggers PD011.
I'm not really familiar with the AST, so I don't know if this is an easy check or not.
What about type hinting?
Mypy uses the typed-ast package, but this has been included on the ast on Python 3.8+
So, in python < 3.8 you can use typed-ast
, and on python 3.8+ you can just use the ast
Using type hinting, this shouldn't be a false positive:
import pandas as pd
bl: UnixDateBuilder = UnixDateBuilder()
bl.sub(month=3).change(day=1, hour=0, minute=0, second=0)
You can get the type of something using typing.get_type_hints()
Example from towardsdatascience: https://towardsdatascience.com/python-type-hints-docstrings-7ec7f6d3416b
Is it possible to collect any pandas
DataFrame objects to an internal
dict
variable, then confirm the suspect object is in this dict
before
throwing the error code? I'm not familiar with the details of flake8
,
but I suspect it might require a hook from that end to make this work.
On Mon, Mar 21, 2022 at 1:37 AM David Davó @.***> wrote:
What about type hinting?
Mypy uses the typed-ast https://pypi.org/project/typed-ast/ package, but this has been included on the ast on Python 3.8+
So, in python < 3.8 you can use typed-ast, and on python 3.8+ you can just use the ast
Using type hinting, this shouldn't be a false positive:
import pandas as pd
bl: UnixDateBuilder = UnixDateBuilder() bl.sub(month=3).change(day=1, hour=0, minute=0, second=0)
You can get the type of something using typing.get_type_hints()
Example from towardsdatascience: https://towardsdatascience.com/python-type-hints-docstrings-7ec7f6d3416b
— Reply to this email directly, view it on GitHub https://github.com/deppen8/pandas-vet/issues/74#issuecomment-1073634594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO2YGSKVIKYFSSBXUXCKA3VBAYMRANCNFSM4IXGILGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>