cubed
cubed copied to clipboard
Bounded-memory serverless distributed N-dimensional array processing
There are various ways this might be implemented: * Sequentially - compute each output in turn. (This is fine for a proof of concept.) * Zarr structured arrays * Multiple...
 returns ```python ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /tmp/ipykernel_495009/1422487590.py:1 in │ │ │ │ │ │ /home/tom/Documents/Work/Code/cubed/cubed/core/array.py:112 in compute │ │ │ │ 111...
https://data-apis.org/array-api/latest/API_specification/indexing.html#boolean-array-indexing The output has a data-dependent output shape, and it's difficult to implement since each chunk can be an arbitrary size, yet Zarr doesn't support non-regular chunk sizes. It may...
I would be curious how the cubed approach compares in performance to my Xarray-Beam library, beyond the superficial differences (NumPy vs Xarray data): https://github.com/google/xarray-beam One issue that comes to mind...
Can be implemented using `concat` and `reshape`. See https://data-apis.org/array-api/latest/API_specification/generated/signatures.manipulation_functions.roll.html
This can be implemented using indexing, although it needs a step size of -1, which is not currently implemented. See https://data-apis.org/array-api/latest/API_specification/generated/signatures.manipulation_functions.flip.html
FYI this might be useful in cubed's tests too https://github.com/dask/dask/pull/9374
There are some changes in cubed to the executors that derive from rechunker. It would be worth discussing how to share the code for these. [`BeamPipelineExecutor`](https://github.com/tomwhite/cubed/blob/a131c1b8168915355c222ccd16ec4332e01e4803/cubed/runtime/executors/beam.py#L83-L126) and [`LithopsPipelineExecutor`](https://github.com/tomwhite/cubed/blob/a131c1b8168915355c222ccd16ec4332e01e4803/cubed/runtime/executors/lithops.py#L25-L51) are new...
Currently, the code checks to see if `nchunks_initialized == nchunks` which won't work for 0-d arrays. We probably need to add some metadata to indicate that the array has been...
Package versions need to match on the client and remote machine. Usually this is not a problem as remote images are built at the same time as the client is...