ibis icon indicating copy to clipboard operation
ibis copied to clipboard

fix(dask,pandas): make groupby null behavior consistent with other backends

Open JonAnCla opened this issue 1 year ago • 8 comments

This is for consistency with other backends (duck, postgres), NULL should be included in output groups

Please point me to where unit tests should be added e.g. maybe somewhere in here?

Description of changes

adds dropna=False to .groupby() call for pandas, otherwise if any rows contain NULL in the groupby columns, they are dropped from the output

Issues closed

  • Resolves #9520

-->

JonAnCla avatar Jul 05 '24 15:07 JonAnCla

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

github-actions[bot] avatar Jul 05 '24 15:07 github-actions[bot]

@hottwaj It looks like you may need to remove the xfail markers on the tests that are now XPASSing.

cpcloud avatar Jul 05 '24 16:07 cpcloud

Which is pretty nice, it means there were already a couple of tests that hit this code path :)

cpcloud avatar Jul 05 '24 16:07 cpcloud

@hottwaj Who is @jc-5s? Ideally the people that submit PRs are the same people who are making commits.

cpcloud avatar Jul 06 '24 13:07 cpcloud

@hottwaj Who is @jc-5s? Ideally the people that submit PRs are the same people who are making commits.

Ah they are both my accounts, one personal and one work related. Didn't realise I had switched between the two sorry :)

JonAnCla avatar Jul 06 '24 14:07 JonAnCla

Thanks for this, great that there are tests in place already :)

It seems like this test doesn't specifically check on a groupby where NULL is one of several values on which to group - should I add a test that better covers that case? https://github.com/ibis-project/ibis/pull/9526/files#diff-db87b69de7d36308b9695fb95ec797a308009badb8d76f8be92a369c1796edf7L1578

Or maybe I've misunderstood as I can't work out the source of arg "df" in the test

JonAnCla avatar Jul 08 '24 12:07 JonAnCla

Hi there, sorry for bugging: is there anything I can do to help progress this, or is this ready to go?

I still think a more targeted unit test might be needed, but maybe I've misunderstood what the existing tests cover...

Thanks!

JonAnCla avatar Jul 11 '24 09:07 JonAnCla

Hey @hottwaj!

Since this is a breaking change to output behavior, we'll merge it into 10.0. In the meantime, if you want to add a more targeted unit test that'd be great!

cpcloud avatar Jul 12 '24 17:07 cpcloud

This PR is no longer viable since we removed the dask and pandas backends!

cpcloud avatar Sep 14 '24 11:09 cpcloud