pynapple
pynapple copied to clipboard
Make tsd, tsd-tensor, and tsd-frame lazy
Fixes #245
@gviejo @BalzaniEdoardo I just removed the [:]
when loading arrays from the NWB file and added a lazy
flag to the BaseTsd
to allow to skip the convert_to_numpy
. Note that this works only for array-like, but both h5py.Dataset
and zarr.Array
support this.
It seems to work well on my side! Let me know what you think!
This seems to break the jitcontinuous_perievent
...would yiou be fine in having a parallelized version that doesn't rely on numba, but on the concurrent.futures.ProcessPoolExecutor
? This will enable to compute continuous perievents without loading the entire LFP traces in memory
Pull Request Test Coverage Report for Build 9179520861
Details
- 18 of 23 (78.26%) changed or added relevant lines in 3 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage decreased (-0.02%) to 83.142%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
pynapple/core/_core_functions.py | 1 | 2 | 50.0% |
pynapple/core/time_series.py | 12 | 13 | 92.31% |
pynapple/io/interface_nwb.py | 5 | 8 | 62.5% |
<!-- | Total: | 18 | 23 |
Totals | |
---|---|
Change from base Build 9117465489: | -0.02% |
Covered Lines: | 2436 |
Relevant Lines: | 2851 |
💛 - Coveralls
Seems like a coverall issue? Tests are passing locally on my side
Ok really weird error with coveralls. I have another PR where coveralls runs fine.
I noticed that all the PRs so far has been from you and @BalzaniEdoardo , so maybe some of the actions are relying on some secrets that my fork doesn't have access to
Hi Alessio,
I checked out your PR. It's very cool that the lazy-loading is working. On my side test passes, I'll look more into what is going on with the coveralls.
It would be nice to add tests to test_nwb.py
that checks the data in the Tsd is lazy-loaded and that you can run pynapple calls once you slice into the data, something like this,
def test_is_lazy_loaded(path):
tsd_lazy = nap.load_file(path)
assert isinstance(tsd_lazy.d, h5py.Dataset)
assert isinstance(tsd_lazy.t, np.ndarray)
def test_restrict_lazy_loading(tsd_lazy, tsd_regular, epoch):
assert all(tsd_lazy[:1000].restrict(epoch).t == tsd_regular[:1000].restrict(epoch).t)
assert np.all(tsd_lazy[:1000].restrict(epoch).d == tsd_regular[:1000].restrict(epoch).d)
assert np.all(tsd_lazy[:1000].restrict(epoch).time_support.values == tsd_regular[:1000].restrict(epoch).time_support.values)
It would be very nice to have a version of this in which restrict
for selecting based on time would work without the slicing for reading the data first. This would allow you to slice based on time instead of having to calculate the indices.
tsd_regular = tsd_lazy.restrict(epochs)
That requires way more work, but it is not technically hard. All you need is splitting the jitrestrict function in two, one that receives time_array
, starts
and ends
and returns the indices for slicing, and a regular function that applies the slicing only.
# in _jitted_functions.py
@jit(no_python=True)
def jit_compute_indices(time_array, starts, ends)
.... # calculate the ix indices for slicing as in the current jitrestrict
return ix
def jitresrict(time_array, data_array, starts, ends):
ix = jit_compute_indices(time_array, starts, ends):
return time_array[ix], data_array[ix]
I wonder what @gviejo thinks about this idea. We should compare the performance but my guess is that this new jitrestrict
will be still efficient because the last slicing just creates a view when the data_array
is loaded.
I can see a path for this but it will take some time since it will requires to update all the jitted functions. Right now, we are focusing on the integration with pynajax so there will be some changes in the core. We can ping you on this PR or in the issue #245 when we can tackle this.
@alejoe91 thanks for the PR Ale! I added a bunch of tests and loaded the "time" array (only the data is lazy loaded).
This little change allows you to treat your lazy loaded array as if it was fully loaded; If you want or load a chunk of your data only, you should call restrict
or get
before any computation. The call to restrict will create a vector of indices using the loaded time array, and use it to slice the hdf5.Dataset
which will load only what you need.
We will be merging into dev
so far, test it out and tell us how it goes!
Codecov Report
Attention: Patch coverage is 73.68421%
with 10 lines
in your changes are missing coverage. Please review.
Files | Coverage Δ | |
---|---|---|
pynapple/core/_core_functions.py | 85.91% <50.00%> (ø) |
|
pynapple/core/time_series.py | 91.90% <84.61%> (ø) |
|
pynapple/io/interface_nwb.py | 75.40% <69.56%> (ø) |
Thank you @alejoe91 . This should appear soon in 0.6.6.