pandas-vet icon indicating copy to clipboard operation
pandas-vet copied to clipboard

Check for .groupby aggregation patterns

Open deppen8 opened this issue 6 years ago • 4 comments

Not even sure if this can be implemented, but maybe with a clever regular expression, it could. See the flashcard for some more details.

deppen8 avatar Feb 26 '19 18:02 deppen8

Both of the anti-patterns have df.groupby('group_col')['agg_col'] ... , which we should be able to identify with the following check:

if isinstance(node.func.value, ast.Subscript) and 
              node.func.value.value.func.attr == 'group_by'

(AST chains can get crazy)

Do you see any cases where this pattern wouldn't work (i.e., other groupby anti-patterns that don't use slicing? or examples where slicing the groupby object would be acceptable?)

simchuck avatar Mar 07 '19 20:03 simchuck

Turns out the AST syntax is actually more complex when the .agg() method is chained. But the fundamental pattern should still be the same -- we want to check for groupby() methods that also use slicing.

Unless there are cases where that syntax might be appropriate.

simchuck avatar Mar 08 '19 18:03 simchuck

Unless there are cases where that syntax might be appropriate.

This is my fear. I am having second thoughts about this because groupby is EXTREMELY common in pandas code and I am afraid there are probably lots of cases I am not considering.

I am going to suggest that we should hold off on implementing this until we have a better survey of groupby in the wild. I will open an issue where we can collect examples.

deppen8 avatar Mar 08 '19 19:03 deppen8

Okay. I have a preliminary commit that I will probably push and create a pull request, if only as a reference for discussion of the implementation.

Although if following strictly the recommendation in the Minimally Sufficient Pandas article, we can probably capture the specific pattern where a slice is applied to the groupby method. And there if someone wants to use that syntax anyway, they always have the # noqa flag that they can use to ignore. (I don't really know too much about this Flake8 mechanism, but it appears that you can either specify for individual lines, or turn off the check via a command line option.)

On Fri, Mar 8, 2019 at 11:48 AM Jacob Deppen [email protected] wrote:

Unless there are cases where that syntax might be appropriate.

This is my fear. I am having second thoughts about this because groupby is EXTREMELY common in pandas code and I am afraid there are probably lots of cases I am not considering.

I am going to suggest that we should hold off on implementing this until we have a better survey of groupby in the wild. I will open an issue where we can collect examples.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deppen8/pandas-vet/issues/24#issuecomment-471053648, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2sGt9GNcmwXTqqttZfItsz-k_A6SIcks5vUr50gaJpZM4bSzl2 .

simchuck avatar Mar 08 '19 20:03 simchuck