flox
flox copied to clipboard
Fast & furious GroupBy operations for dask.array
Bumps [mamba-org/provision-with-micromamba](https://github.com/mamba-org/provision-with-micromamba) from 12 to 13. Release notes Sourced from mamba-org/provision-with-micromamba's releases. v13 Fix channels and channel-priority settings Support linux-aarch64 and osx-arm64 runners Support sel(unix) Commits a319a81 Readme updates (#88)...
### Summary We should be able to improve `method="cohorts"` by first applying the groupby reduction blockwise and then "shuffling". This should substantially reduce the amount of data being moved around....
Closes https://github.com/pydata/xarray/issues/6902 cc @Illviljan @tasansal
Closes #107
- xref #128 - [ ] work on `_choose_engine` - [ ] needs https://github.com/ml31415/numpy-groupies/pull/63
1. We should test with numpy-groupies. CuPy provides [bincount](https://docs.cupy.dev/en/stable/reference/generated/cupy.bincount.html), https://github.com/cupy/cupy/issues/7561 3. We'd have to avoid factorizing with Pandas unfortunately and use `np.searchsorted` or `np.digitize`; or use CuDF?
Supporting just numpy should be relatively easy. This will also work for `method="blockwise"` by default. We may want to rename `groupby_reduce` to `groupby_agg`? For dask proper, we'll need to use...
There have been some upstream fixes: https://github.com/ml31415/numpy-groupies/issues/39#issuecomment-1183091400