cubed
cubed copied to clipboard
Bounded-memory serverless distributed N-dimensional array processing
We briefly discussed the difficulty of maintaining multiple executors. In https://github.com/tomwhite/cubed/pull/168#issuecomment-1542892933 I suggested expanding the CI to run different test jobs with different executors installed. I also saw [this dask...
We know the peak memory used in every task, so if this value ever exceeds the amount specified in `max_mem` (specified by the user) then we can issue a warning....
For very large computations when the number of tasks for an array is much greater than the number of workers, it may be desirable to have more control over task...
I tried to set up the problem of calculating the anomaly with respect to the group mean from https://github.com/pangeo-data/distributed-array-examples/issues/4. I used fake data with the same chunking scheme instead of...
This issue is to explore running the "Transformed Eulerian Mean diagnostic" example in https://github.com/dcherian/ncar-challenge-suite/blob/main/tem.ipynb using Cubed. It uses Xarray, so needs https://github.com/pydata/xarray/pull/7019
Currently we use Zarr structured arrays, which are likely to be slow (since they are not column based) - althought that would be worth checking first. https://github.com/tomwhite/cubed/blob/400dc9adcf21c8b468fce9f24e8d4b8cb9ef2f11/cubed/array_api/statistical_functions.py#L18-L48
To subset my data whilst getting around #196 I tried slicing using xarray's lazy indexing machinery before converting to cubed arrays using `.chunk` (a trick which when used with dask...
With #176 you can run Cubed on Window, but `peak_measured_mem` isn't implemented. To implement we could use `psutil`, but we should make it optional so that other platforms aren't required...
I was looking deeper into how to make https://github.com/pydata/xarray/issues/7813 work. So looks like the nodes are named when they are created by `Plan._new`. Q's: - Would it make sense to...