cubed
cubed copied to clipboard
Bounded-memory serverless distributed N-dimensional array processing
I noticed you're using `concatenate2` from dask. I've found this to be quite wasteful, it's cool in its recursive ability but sucks in that it repeatedly allocates new memory. Cubed...
On some systems there are scenarios where we know we can perform a rechunk entirely in-memory without writing to disk. For example if I locally run this test which performs...
It would be awesome if the backing array implementation supported auto differentiation, that we could access some `grad` method from Cubed. It looks like a bunch of stakeholder libraries have...
This is an umbrella issue for tracking the work for making Cubed work better on a single machine. ## Processes executor Improvements to the `processes` executor - [x] #507 -...
I suspect that the performance of the ThreadPoolExecutor would substantially increase if we strategically placed cython `with nogil` calls. - https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html - https://stackoverflow.com/questions/49047255/cython-nogil-with-threadpoolexecutor-not-giving-speedups - https://stackoverflow.com/questions/56537989/usage-of-threadpoolexecutor-in-conjunction-with-cythons-nogil There are drawbacks to process...
@TomNicholas mentioned that people at SciPy asked how to apply an arbitrary function to arrays in Cubed. An example would help users get started with this. It should cover how...
Tile based operations have been quite a success for creating optimal GPU kernels. The programming model, in my understanding, offers flexibility while taking advantage of cache hierarchies. http://www.eecs.harvard.edu/~htk/publication/2019-mapl-tillet-kung-cox.pdf The [triton...
While I'm not familiar with the Philox pseudo-random number generator (PRNG) in Numpy (it does look well suited to generation in a distributed setting), I think adopting a stateless PRNG...
Could Spark be added as a supported executor? Maybe RDD.map or RDD.mapPartitions would be the correct way to map a function similar to [`map_unordered`](https://github.com/cubed-dev/cubed/blob/main/cubed/runtime/executors/lithops.py#L190) in the Lithops executor. https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.mapPartitions.html#pyspark.RDD.mapPartitions To...
When running [some benchmarks](https://github.com/cubed-dev/cubed/issues/492#issuecomment-2238908343) recently I noticed that turning off Zarr compression resulted in faster IO performance when writing random data to Zarr files on a local SSD. This [post](https://medium.com/@lubonjaariel/to-compress-or-not-to-compress-a-zarr-question-812160b3777d)...