spikeinterface
spikeinterface copied to clipboard
Proposal for internal handling of `time_vector`
Following some discussions with @samuelgarcia and @JoeZiminski , I wanted to open an issue to work out the details of handling time_vectors in a way that won't crash everyone's computers. I'm opening a separate issue from https://github.com/SpikeInterface/spikeinterface/issues/2627, since this will focus on implementation details and will not change the logic.
Summary of what exists:
- User-facing API: users interact with times through
extractor.sample_index_to_time
andextractor.time_to_sample_index
- This means that we have freedom to change the implementation details without users knowing
- There are two logical cases:
-
time_seconds = t_start + sample_index / sampling_frequency
, wheret_start=0
by default -
time_seconds = time_vector[sample_index]
-
Summary of problems:
- The first case causes no issues.
- The second causes two issues:
- For an hour long recording, a time vector takes ~100M samples * (8 bytes / sample) ~= 1GB of memory. This becomes a problem for parallel processing.
- ~~Going from times in seconds to sample indices with a time_vector currently means a
np.searchsorted
for each sample, which could get expensive for two reasons: time_vectors are long so individual searches can be expensive, and doing lots of them is even worse.~~ edit: this is not really an issue, see discussion below.
I propose that we can still efficiently implement time vectors with two changes:
- Store them in a SharedMemory when doing multiprocessing
- ~~Don't do a searchsorted for every sample input into
extractor.sample_index_to_time
~~ edit: seems fine.- For instance, to speed up individual searches, we could initialize the search at (time_seconds - t_start) * sampling_frequency and look left or right depending on what value we find there.
- To speed up multiple searches, if the array of times we're searching for is also sorted than we can use a merge or something if it seems helpful.
I think this could be addressed by a PR that doesn't affect any other code, and would allow time_vectors in the motion correction interpolation.