pandas-vet
pandas-vet copied to clipboard
Check for .groupby aggregation patterns
Not even sure if this can be implemented, but maybe with a clever regular expression, it could. See the flashcard for some more details.
Both of the anti-patterns have df.groupby('group_col')['agg_col'] ...
, which we should be able to identify with the following check:
if isinstance(node.func.value, ast.Subscript) and
node.func.value.value.func.attr == 'group_by'
(AST chains can get crazy)
Do you see any cases where this pattern wouldn't work (i.e., other groupby
anti-patterns that don't use slicing? or examples where slicing the groupby
object would be acceptable?)
Turns out the AST syntax is actually more complex when the .agg()
method is chained. But the fundamental pattern should still be the same -- we want to check for groupby()
methods that also use slicing.
Unless there are cases where that syntax might be appropriate.
Unless there are cases where that syntax might be appropriate.
This is my fear. I am having second thoughts about this because groupby
is EXTREMELY common in pandas
code and I am afraid there are probably lots of cases I am not considering.
I am going to suggest that we should hold off on implementing this until we have a better survey of groupby
in the wild. I will open an issue where we can collect examples.
Okay. I have a preliminary commit that I will probably push and create a pull request, if only as a reference for discussion of the implementation.
Although if following strictly the recommendation in the Minimally
Sufficient Pandas article, we can probably capture the specific pattern
where a slice is applied to the groupby method. And there if someone wants
to use that syntax anyway, they always have the # noqa
flag that they can
use to ignore. (I don't really know too much about this Flake8 mechanism,
but it appears that you can either specify for individual lines, or turn
off the check via a command line option.)
On Fri, Mar 8, 2019 at 11:48 AM Jacob Deppen [email protected] wrote:
Unless there are cases where that syntax might be appropriate.
This is my fear. I am having second thoughts about this because groupby is EXTREMELY common in pandas code and I am afraid there are probably lots of cases I am not considering.
I am going to suggest that we should hold off on implementing this until we have a better survey of groupby in the wild. I will open an issue where we can collect examples.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deppen8/pandas-vet/issues/24#issuecomment-471053648, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2sGt9GNcmwXTqqttZfItsz-k_A6SIcks5vUr50gaJpZM4bSzl2 .