Deepak Cherian comments

Results 1084 comments of


                                            Deepak Cherian

Support specifying chunk sizes using labels (e.g. frequency string)

> is there anywhere else in xarray where we have made some choice about how to let the user choose between specifying via indexes or labels? `coarsen` vs `groupby`/`groupby_bins`/`resample`. I...

Support specifying chunk sizes using labels (e.g. frequency string)

Now I think the way to generalize is to eventually support `Resampler` objects. I think overloading the existing `.chunk` is nicer that a new `chunk_by` method, but could be convinced...

Support specifying chunk sizes using labels (e.g. frequency string)

Responding to @shoyer's [comment](https://github.com/pydata/xarray/pull/9109#issuecomment-2166056509): > Are frequency strings unambiguous? Rechunking already supports memory sizes for Dask using strings. The table [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) doesn't seem to overlap with `MB`, `KB` etc. but...

Add a `.to_polars_df()` method (very similar to `.to_dataframe()`, which implicitly uses pandas)

I believe the key difference here (at least theoretically) is the index creation? That MultiIndex can be quite large, and slow to build IME. So for API perhaps we can...

Add nunique #9548

> used xarray extensively during my academic years, but this is my first attempt at contributing to the repo, so thank you for your patience as I delve deeper! Very...

Add nunique #9548

> My thinking was that if the user tries to reduce the whole array, enforcing a single chunk along the reduction axis will try to load the whole array into...

add benchmarks using pytest-benchmark and codspeed

> feel free to suggest something concrete indexing please. that'll exercise the codec pipeline too. a peakmem metric would be good to track also, if possible.

Proposal: Make Obstore backend the default for s3/gcs/azure/https

+1, those numbers are hard to argue with. This will probably surface some long-tail edge-case bug/change in behaviour, so we should add a configurable option.

Astropy quantities are converted to bare arrays when creating an Index

I wonder if a "basic array index", as prototyped by @benbovy [here](https://notebooksharing.space/view/48ad86aed90f7588c9a475be6747528d87f975cb3317e5bd94265ffaa5a2478f#displayOptions=), would be a good default for this case. Basically we'd use `PandasIndex` only if the provided data are...

.min() doesn't work on np.datetime64 with a chunked Dataset

Yeah that's a good idea. We should check whether dask & numpy supports this now.