plonk icon indicating copy to clipboard operation
plonk copied to clipboard

Use Dask for lazy loading and delayed operations

Open dmentipl opened this issue 4 years ago • 0 comments

Accessing a particle array on a snapshot is currently lazy, in the sense that the data is only loaded from disc into memory when requested, e.g. with snap['position']. However, it remains there in the dictionary snap._arrays.

An alternative is to load the array as a Dask array and use the resulting object as if it were a NumPy array loaded into memory. Then complicated expressions can be written before anything is loaded into memory. The computation is executed with the .compute() method. Dask also allows for easy parallelization on both local, i.e. a multi-core laptop, and remote hardware, i.e. a supercomputing cluster.

A suggestion is to adjust the __getitem__() method to return a Dask array, and to not store the array in snap._arrays.

dmentipl avatar Jan 13 '20 02:01 dmentipl