ruff icon indicating copy to clipboard operation
ruff copied to clipboard

PD rules trigger on non-Pandas DataFrames

Open beskep opened this issue 2 years ago • 4 comments

command: ruff check test.py ruff version: ruff 0.0.282 settings: select = ['ALL']

example:

import polars as pl

pldf = pl.DataFrame()
pldf.pivot()  # PD010 `.pivot_table` is preferred to `.pivot` or `.unstack`; provides same functionality

polars DataFrame provides .pivot() function but no .pivot_table() unlike pandas.

beskep avatar Aug 09 '23 00:08 beskep

Difficult for us to fully resolve this without a full type inference engine (we could use heuristics, like avoid flagging these rules if polars is imported, but that comes with other problems: you don't have to import Polars in order to access a Polars DataFrame, and just because you import Polars doesn't mean you aren't working with Pandas DataFrames anywhere). Likely won't be fixed in the near-term.

(I'd recommend against using these rules if you're working with Polars.)

charliermarsh avatar Aug 09 '23 02:08 charliermarsh

for a simpler heuristic, would it be possible to check the alias used to instantiate the dataframe? pl.DataFrame rather than pd.DataFrame gives a pretty strong clue that it's not pandas

MarcoGorelli avatar Aug 23 '23 13:08 MarcoGorelli

Currently the pandas rules are applied on many non pandas objects. For example PD011 tries to stop you from using .values anywhere, even if you use a library where you should use it. Therefore some kind of check, if the object is even belonging to pandas would be pretty useful.

kleinicke avatar Apr 17 '24 12:04 kleinicke

The same thing happens with the Python DEAP package which has class members named values.

bje- avatar Jul 01 '24 02:07 bje-

Ruff is actually really trigger happy here, just posting another quick example that causes ruff to trigger while just messing around with python builtins:

# ruff: noqa: F841
# pyright: reportUnusedVariable=false

x = {}
values_dict_func = x.values  # PD011

ItsDrike avatar Jul 21 '24 13:07 ItsDrike

Difficult for us to fully resolve this without a full type inference engine (we could use heuristics, like avoid flagging these rules if polars is imported, but that comes with other problems: you don't have to import Polars in order to access a Polars DataFrame, and just because you import Polars doesn't mean you aren't working with Pandas DataFrames anywhere). Likely won't be fixed in the near-term.

I think the false positive rate on this warning is so high it should be abandoned. Could Pandas be modified to emit a deprecation warning instead?

bje- avatar Jul 22 '24 17:07 bje-

Why not just turn it off in your project? By definition you've opted into it.

charliermarsh avatar Jul 22 '24 17:07 charliermarsh

A good lint tool should be one that doesn't require littering your source files with pragmas to disable false positives. Isn't one of the purposes of a linter to improve code readability?

(I just used a noqa pragma to disable NPY002, but in this case, ruff is correct, but I can't change it).

bje- avatar Jul 22 '24 18:07 bje-