Deepak Cherian

Results 1084 comments of Deepak Cherian

@jrbourbeau this kind of thing is what might cause regridding blowups xref https://github.com/dask/dask/issues/2225 cc @maxrjones

> Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet. Nah, in...

This could basically be something like ``` ds.notnull().groupby(x=BinGrouper(...), y=BinGrouper(...), enso_phase=UniqueGrouper(...)).sum() # TODO: handle `density` ``` We'll need https://github.com/pydata/xarray/pull/9522 + some skipping of `_ensure_1d` in `GroupBy.__init__` to handle the case of...

This is using the `sliding_window_view` trick under the hood, which composes badly with anything that does a memory copy (like `weighted` in your example) https://github.com/dask/dask/blob/d45ea380eb55feac74e8146e8ff7c6261e93b9d7/dask/array/overlap.py#L808 We actually use this approach...

I support the approach, but it'd be good to see the impact on `ds.rolling().mean()` which also uses `construct` but is clever about it to avoid the memory blowup.

Yes, https://github.com/pydata/xarray/issues/3937, but we've struggled to move on that. `construct` is a pretty useful escape hatch for custom workloads, so we should optimize for it behaving sanely.

> new API that would simply be a combination of shuffle and other, existing methods. the equivalent would be a little involved: ``` shuffled = ds.shuffle(grouper) mapped = xr.map_blocks(lambda x:...

Thanks @scottyhq > One other thing that often gets neglected in test suites is operating on remote data. This is lining up with the "pangeo integration tests" that came up...

Looks like Quansight thinks that GH actions is a good place to benchmark scikit-learn: https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/ so may be we can set that up for our existing benchmarks. Here's the workflow:...

@TomAugspurger are you still in charge of the pydata benchmarking machine? If so, could you add xarray to the list please (https://pandas.pydata.org/speed/)? @Illviljan has made major improvements so it should...