plonk
plonk copied to clipboard
Use Dask for lazy loading and delayed operations
Accessing a particle array on a snapshot is currently lazy, in the sense that the data is only loaded from disc into memory when requested, e.g. with snap['position']
. However, it remains there in the dictionary snap._arrays
.
An alternative is to load the array as a Dask array and use the resulting object as if it were a NumPy array loaded into memory. Then complicated expressions can be written before anything is loaded into memory. The computation is executed with the .compute()
method. Dask also allows for easy parallelization on both local, i.e. a multi-core laptop, and remote hardware, i.e. a supercomputing cluster.
A suggestion is to adjust the __getitem__()
method to return a Dask array, and to not store the array in snap._arrays
.