xarray
xarray copied to clipboard
Could we defer to flox for `GroupBy.first`?
Is your feature request related to a problem?
I was wondering why a groupby("foo").first() call was going so slowly — I think we run a python loop for this, rather than calling into flox:
https://github.com/pydata/xarray/blob/b9780e7a32b701736ebcf33d9cb0b380e92c91d5/xarray/core/groupby.py#L1218-L1231
Describe the solution you'd like
Could we call into flox? Numbagg has the routines...
Describe alternatives you've considered
No response
Additional context
No response
Yes , the minor complication is that we should dispatch nanfirst and nanlast but not first, last. The latter are simply indexing using an indexer we already know, so the reduction approach is overkill.
Closing https://github.com/pydata/xarray/issues/8025 in favor of this one.
Out of curiosity how many groups does your problem have?
Sorry I missed #8025, I thought I searched; I guess first hit lots of unrelated issues and I missed it.
Out of curiosity how many groups does your problem have?
About 15K...
About 15K...
Do you end up using dask for this, or just numbagg? Are these groups randomly distributed along the dimension, or are there patterns to how they are distributed (e.g. are they sequential)?
Just curious...
Do you end up using dask for this, or just numbagg?
I ended up just leaving it running for hours!
Are these groups randomly distributed along the dimension, or are there patterns to how they are distributed (e.g. are they sequential)?
Yes they're largely sequential!