fix(dask,pandas): make groupby null behavior consistent with other backends
This is for consistency with other backends (duck, postgres), NULL should be included in output groups
Please point me to where unit tests should be added e.g. maybe somewhere in here?
Description of changes
adds dropna=False to .groupby() call for pandas, otherwise if any rows contain NULL in the groupby columns, they are dropped from the output
Issues closed
- Resolves #9520
-->
ACTION NEEDED
Ibis follows the Conventional Commits specification for release automation.
The PR title and description are used as the merge commit message.
Please update your PR title and description to match the specification.
@hottwaj It looks like you may need to remove the xfail markers on the tests that are now XPASSing.
Which is pretty nice, it means there were already a couple of tests that hit this code path :)
@hottwaj Who is @jc-5s? Ideally the people that submit PRs are the same people who are making commits.
@hottwaj Who is @jc-5s? Ideally the people that submit PRs are the same people who are making commits.
Ah they are both my accounts, one personal and one work related. Didn't realise I had switched between the two sorry :)
Thanks for this, great that there are tests in place already :)
It seems like this test doesn't specifically check on a groupby where NULL is one of several values on which to group - should I add a test that better covers that case? https://github.com/ibis-project/ibis/pull/9526/files#diff-db87b69de7d36308b9695fb95ec797a308009badb8d76f8be92a369c1796edf7L1578
Or maybe I've misunderstood as I can't work out the source of arg "df" in the test
Hi there, sorry for bugging: is there anything I can do to help progress this, or is this ready to go?
I still think a more targeted unit test might be needed, but maybe I've misunderstood what the existing tests cover...
Thanks!
Hey @hottwaj!
Since this is a breaking change to output behavior, we'll merge it into 10.0. In the meantime, if you want to add a more targeted unit test that'd be great!
This PR is no longer viable since we removed the dask and pandas backends!