Add sparse_waveform_mean and sd column to Units table

Open rly opened this issue 9 months ago • 0 comments

Address https://github.com/NeurodataWithoutBorders/pynwb/issues/1693

This issue tracks the revamp of https://github.com/NeurodataWithoutBorders/nwb-schema/pull/576 which was merged and then pulled (https://github.com/NeurodataWithoutBorders/nwb-schema/pull/595) after issues were raised when implementing this in pynwb.

The NWB TAB discussed this issue and proposed solutions on Tuesday.

The original proposal in https://github.com/NeurodataWithoutBorders/pynwb/issues/1693 is to change "waveform_mean" and "waveform_sd" to be singly ragged arrays that allow a different number of electrodes per unit.

Option 1: Support both the old full array form and the new ragged array form under the same column name "waveform mean". API could detect if "waveform_mean_index" column exists, and if so, parse the slices assuming the new format. Pro: Both forms represent the same data, just structured differently, so it makes sense to use the same column name. Con: Ryan thinks this is likely to be confusing to users and challenging to implement. Read/slicing pattern is different, documentation is different, and it will be hard for pynwb to detect during file creation which form should be used.

Option 2: Support both forms but make the ragged array form doubly ragged – over both electrodes and samples. Then the read/slicing pattern is the same. Making the table doubly ragged might be less efficient though. And it is quite rare to have a different number of samples, though it has happened, e.g., where one session includes units acquired from two different acquisition systems with different sampling rates, and there are a different number of samples for spikes from different systems (to keep the length of the window in seconds the same). However, this use case may be better served by storing units under two different Units tables.

Option 3: Introduce a new column named "sparse_waveform_mean" (or see other options below) that is singly ragged under a different name. There may still be a performance benefit to having the old full array form. If so, it may be better to keep both forms and maybe rename "waveform_mean" to "full_waveform_mean". Alternatively, we could discourage and deprecate "waveform_mean" in favor of "sparse_waveform_mean" which can handle the full case. Pro: Clean implementation Con: Having two columns that represent the same data and are accessed differently will probably be confusing to users.

Possible new column names:

sparse_waveform_mean (consistent with spikeinterface, preferred by Alessio)
varying_waveform_mean
selected_waveform_mean
electrodes_waveform_means

We are leaning toward Option 3 and gradual deprecation of the existing "waveform_mean" column but I would like to run some quick performance tests first.

Mar 20 '25 05:03 rly