Deepak Cherian
Deepak Cherian
> is there anywhere else in xarray where we have made some choice about how to let the user choose between specifying via indexes or labels? `coarsen` vs `groupby`/`groupby_bins`/`resample`. I...
Now I think the way to generalize is to eventually support `Resampler` objects. I think overloading the existing `.chunk` is nicer that a new `chunk_by` method, but could be convinced...
Responding to @shoyer's [comment](https://github.com/pydata/xarray/pull/9109#issuecomment-2166056509): > Are frequency strings unambiguous? Rechunking already supports memory sizes for Dask using strings. The table [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) doesn't seem to overlap with `MB`, `KB` etc. but...
I believe the key difference here (at least theoretically) is the index creation? That MultiIndex can be quite large, and slow to build IME. So for API perhaps we can...
> used xarray extensively during my academic years, but this is my first attempt at contributing to the repo, so thank you for your patience as I delve deeper! Very...
> My thinking was that if the user tries to reduce the whole array, enforcing a single chunk along the reduction axis will try to load the whole array into...
> feel free to suggest something concrete indexing please. that'll exercise the codec pipeline too. a peakmem metric would be good to track also, if possible.
+1, those numbers are hard to argue with. This will probably surface some long-tail edge-case bug/change in behaviour, so we should add a configurable option.
I wonder if a "basic array index", as prototyped by @benbovy [here](https://notebooksharing.space/view/48ad86aed90f7588c9a475be6747528d87f975cb3317e5bd94265ffaa5a2478f#displayOptions=), would be a good default for this case. Basically we'd use `PandasIndex` only if the provided data are...
Yeah that's a good idea. We should check whether dask & numpy supports this now.