Tom Nicholas issues

Results 182 issues of


Tom Nicholas

Store intermediate data in a directory named after the compute ID

> We should probably make Cubed store its intermediate data in a directory named `{CONTEXT_ID}/{compute_id}`, but that's a bit more work. _Originally posted by @TomNicholas in https://github.com/cubed-dev/cubed-benchmarks/pull/10#discussion_r1513284448_

Needed to set environment variables

I got the first example to run :champagne: `python examples/lithops-add-asarray.py "s3://cubed-$USER-temp" cubed-runtime` But to get it to run on AWS (I don't have a Modal account) I did have to...

documentation

Estimate monetary cost of executing plan

Cubed arguably has enough information to give a rough estimate of the monetary cost of executing the plan before starting execution. I'm imagining a new method `.estimate_cost(executor)` that is similar...

enhancement

Optimizing the shuffle

Cubed currently always implements the shuffle operation as an all-to-all rechunking using the [algorithm from rechunker](https://rechunker.readthedocs.io/en/latest/algorithm.html). This creates an intermediate persistent Zarr store, and requires all chunks to be written...

help wanted

primitive

optimization

Lithops version mismatch

I tried to re-run the quadratic means example with recent improvements to Cubed but got stuck on a Lithops version mismatch error ``` Exception: Lithops version mismatch. Host version 2.9.0...

Use google tensorstore to read/write to zarr?

I expect you're already aware of this @tomwhite , but I wanted to ask whether or not you thought the [google-tensorstore project](https://ai.googleblog.com/2022/09/tensorstore-for-high-performance.html) might be useful in cubed. @rabernat [suggested](https://discourse.pangeo.io/t/google-tensorstore-3d-data-package/2778) benchmarking...

zarr

Add map_overlap as a new core op

It would be nice to add `map_overlap` alongside `map_blocks`, `blockwise`, `rechunk`, and `apply_gufunc`. It's currently not directly used within xarray (even within `xarray.map_blocks`, which builds a HLG), but maybe it...

enhancement

Useful functions not in the Array API Standard

There are a few numpy functions which xarray calls on wrapped arrays but which are not (yet) in the Array API Standard. (See https://github.com/data-apis/array-api/issues/187#issuecomment-1553615779) Cubed could choose to implement these...

enhancement

array api

xarray-integration

Computations requiring irregularly-chunked Zarr stores

All intermediate results in Cubed are written out to persistent storage via Zarr, but currently Zarr can't represent any chunked array, because the Zarr spec does not yet support irregular...

enhancement

zarr

upstream

np.nanmean executes eagerly on cubed arrays but lazily on dask arrays

See image for demonstration. ![Screenshot from 2023-03-14 19-26-13](https://user-images.githubusercontent.com/35968931/225164219-0125df14-e3f9-46ee-85c2-8ec523093ec1.png) `np.nanmean` is called by xarray's `.mean()` method when `skipna=True`, which is the default.