pynapple icon indicating copy to clipboard operation
pynapple copied to clipboard

Support lazy `TsdFrame`/`TsdTensor`

Open alejoe91 opened this issue 1 year ago • 2 comments

Currently, this call will load all LFP data into RAM, which is prohibitive for very large datasets:

data = nap.NWBFile(nwb)
data["ElectricalSeriesProbeA-LFP"]

Would it be possible to make a lazy TsdFrame (and probably TsdTensor) representation?

This would also allow to speed up and minimize the memory footprint for the compute_perievent_continuous and other LFP-related processing

alejoe91 avatar Mar 13 '24 11:03 alejoe91

Adding comments from #185 here

I was going through TsdTensor and I'm wondering if it supports lazy compute.

Example use case: With calcium imaging data, the outer product of spatial & temporal components can result in a huge array that is hundreds of gigabytes or terrabytes in size. It is almost never necessary to actually compute the entire array, so using a lazy compute data structure works well. The implementation in mescore computes the outer product only when the array is sliced: https://github.com/nel-lab/mesmerize-core/blob/master/mesmerize_core/arrays/_cnmf.py#L131-L141

As discussed:

So far, feeding a numpy memmap to d in TsdTensor seems to work. It's probably possible that any object that implements the numpy array API could work if fed to d. Will test with our LazyArray implementation, as well as other array types (like zarr, dask etc.)

gviejo avatar Apr 01 '24 22:04 gviejo

Thanks @gviejo

This seems related but the in memory object is created by nap when accessing the NWB field. So I think that this might need a patch to pass directly the h5py.dataset or the zarr.array to the TsdFrame

alejoe91 avatar Apr 04 '24 07:04 alejoe91

Fixed by https://github.com/pynapple-org/pynapple/pull/264

alejoe91 avatar May 22 '24 07:05 alejoe91