pynapple icon indicating copy to clipboard operation
pynapple copied to clipboard

Make tsd, tsd-tensor, and tsd-frame lazy

Open alejoe91 opened this issue 10 months ago • 7 comments

Fixes #245

@gviejo @BalzaniEdoardo I just removed the [:] when loading arrays from the NWB file and added a lazy flag to the BaseTsd to allow to skip the convert_to_numpy. Note that this works only for array-like, but both h5py.Dataset and zarr.Array support this. It seems to work well on my side! Let me know what you think!

alejoe91 avatar Apr 09 '24 11:04 alejoe91

This seems to break the jitcontinuous_perievent...would yiou be fine in having a parallelized version that doesn't rely on numba, but on the concurrent.futures.ProcessPoolExecutor? This will enable to compute continuous perievents without loading the entire LFP traces in memory

alejoe91 avatar Apr 09 '24 15:04 alejoe91

Pull Request Test Coverage Report for Build 9179520861

Details

  • 18 of 23 (78.26%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.02%) to 83.142%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pynapple/core/_core_functions.py 1 2 50.0%
pynapple/core/time_series.py 12 13 92.31%
pynapple/io/interface_nwb.py 5 8 62.5%
<!-- Total: 18 23
Totals Coverage Status
Change from base Build 9117465489: -0.02%
Covered Lines: 2436
Relevant Lines: 2851

💛 - Coveralls

coveralls avatar Apr 10 '24 18:04 coveralls

Seems like a coverall issue? Tests are passing locally on my side

alejoe91 avatar Apr 12 '24 08:04 alejoe91

Ok really weird error with coveralls. I have another PR where coveralls runs fine.

gviejo avatar Apr 12 '24 16:04 gviejo

I noticed that all the PRs so far has been from you and @BalzaniEdoardo , so maybe some of the actions are relying on some secrets that my fork doesn't have access to

alejoe91 avatar Apr 12 '24 16:04 alejoe91

Hi Alessio,

I checked out your PR. It's very cool that the lazy-loading is working. On my side test passes, I'll look more into what is going on with the coveralls. It would be nice to add tests to test_nwb.py that checks the data in the Tsd is lazy-loaded and that you can run pynapple calls once you slice into the data, something like this,


def test_is_lazy_loaded(path):
    tsd_lazy = nap.load_file(path)
    assert isinstance(tsd_lazy.d, h5py.Dataset)
    assert isinstance(tsd_lazy.t, np.ndarray)

def test_restrict_lazy_loading(tsd_lazy, tsd_regular, epoch):
      assert all(tsd_lazy[:1000].restrict(epoch).t == tsd_regular[:1000].restrict(epoch).t)
      assert np.all(tsd_lazy[:1000].restrict(epoch).d == tsd_regular[:1000].restrict(epoch).d)
      assert np.all(tsd_lazy[:1000].restrict(epoch).time_support.values == tsd_regular[:1000].restrict(epoch).time_support.values)

It would be very nice to have a version of this in which restrict for selecting based on time would work without the slicing for reading the data first. This would allow you to slice based on time instead of having to calculate the indices.

tsd_regular = tsd_lazy.restrict(epochs)

That requires way more work, but it is not technically hard. All you need is splitting the jitrestrict function in two, one that receives time_array, starts and ends and returns the indices for slicing, and a regular function that applies the slicing only.

# in _jitted_functions.py
@jit(no_python=True)
def jit_compute_indices(time_array, starts, ends)
    .... # calculate the ix indices for slicing as in the current jitrestrict
    return ix

def jitresrict(time_array, data_array, starts, ends):
    ix = jit_compute_indices(time_array, starts, ends):
    return time_array[ix], data_array[ix]

I wonder what @gviejo thinks about this idea. We should compare the performance but my guess is that this new jitrestrict will be still efficient because the last slicing just creates a view when the data_array is loaded.

BalzaniEdoardo avatar Apr 13 '24 23:04 BalzaniEdoardo

I can see a path for this but it will take some time since it will requires to update all the jitted functions. Right now, we are focusing on the integration with pynajax so there will be some changes in the core. We can ping you on this PR or in the issue #245 when we can tackle this.

gviejo avatar Apr 15 '24 18:04 gviejo

@alejoe91 thanks for the PR Ale! I added a bunch of tests and loaded the "time" array (only the data is lazy loaded).

This little change allows you to treat your lazy loaded array as if it was fully loaded; If you want or load a chunk of your data only, you should call restrict or get before any computation. The call to restrict will create a vector of indices using the loaded time array, and use it to slice the hdf5.Dataset which will load only what you need. We will be merging into dev so far, test it out and tell us how it goes!

BalzaniEdoardo avatar May 20 '24 19:05 BalzaniEdoardo

Codecov Report

Attention: Patch coverage is 73.68421% with 10 lines in your changes are missing coverage. Please review.

Files Coverage Δ
pynapple/core/_core_functions.py 85.91% <50.00%> (ø)
pynapple/core/time_series.py 91.90% <84.61%> (ø)
pynapple/io/interface_nwb.py 75.40% <69.56%> (ø)

codecov[bot] avatar May 21 '24 18:05 codecov[bot]

Thank you @alejoe91 . This should appear soon in 0.6.6.

gviejo avatar May 21 '24 18:05 gviejo