spikeinterface
spikeinterface copied to clipboard
sortingcomponents.peak_selection.select_peaks doesn't catch when n_peaks is too large
Current Behavior
When select_peaks(peaks, n_peaks=n_peaks, ...) receives argument n_peaks >= peaks.shape[0], it returns peaks.
Expected behavior
When select_peaks(peaks, n_peaks=n_peaks, ...) receives argument n_peaks >= peaks.shape[0], it should throw an error stating "Number of selected peaks greater than or equal to number of input peaks" (or something along those lines).
Example Code to Reproduce Error
from pathlib import Path
import spikeinterface.core as sc
import spikeinterface.extractors as se
from spikeinterface.sortingcomponents.peak_detection import detect_peaks
from spikeinterface.sortingcomponents.peak_selection import select_peaks
# Load the Data
basepath = Path("/Volumes/T7/CatalystNeuro")
filepath = basepath / "TutorialDatasets/sub-npI1_ses-20190413_behavior+ecephys.nwb"
recording = se.read_nwb(filepath)
rec10s = recording.frame_slice(start_frame=0, end_frame=recording.sampling_frequency * 10)
print(rec10s)
# Detect Peaks
noise_levels = sc.get_noise_levels(rec10s, return_scaled=False)
job_kwargs = dict(n_jobs=1, progress_bar=True)
peaks = detect_peaks(rec10s, method='by_channel', noise_levels=noise_levels, **job_kwargs)
# Select Peaks
num_detected = peaks.shape[0]
print("# of peaks detected:", num_detected)
selected_peaks = select_peaks(peaks, n_peaks=(num_detected+1), method='uniform',
noise_levels=noise_levels, by_channel=False)
num_selected = selected_peaks.shape[0]
print("# of peaks selected:", num_selected)
Thanks @pauladkisson
Actually I think that the behavior is fair..the select peaks is used to limit computation time, so in my mind the n_peaks is a maximum number of peaks..
Otherwise, one might need to change the parameter (sometimes deeply nested in the sorter params), in case of recordings shorter than usual, or with fewer peaks. I think it's good to avoid it.
Maybe changing the naming slightly or specify the behavior in the docs would be better.
@alejoe91 I think that kind of use case is best handled with a try/except block around the select_peaks function if one anticipates the number of peaks in peaks to be less than n_peaks. The main use case of select_peaks is to actually select a subset of peaks, so if n_peaks is improperly specified, I think it's appropriate to raise an error (or at the very least a warning).
Yeah I see your point. I would actually vote for a warning, since the goal of this function is to limit the number of peaks to a certain amount and if you have less than you're probably happy with it :)
What @h-mayorquin @samuelgarcia and @yger think?
@alejoe91 I think that kind of use case is best handled with a try/except block around the
select_peaksfunction if one anticipates the number of peaks inpeaksto be less thann_peaks. The main use case ofselect_peaksis to actually select a subset of peaks, so ifn_peaksis improperly specified, I think it's appropriate to raise an error (or at the very least a warning)
Am I understanding well that you are want to use an assertion to inform the user that he has unintendedly chosen a value of n_peaks that is larger than number of peaks that he has? What are some possible bad consequences of us not informing the user of this?
Fixing for this would block a possible use of the function where I run detect_peaks but I want just want to assure myself that downstream computations do not run in more than n_peaks=1000. That is, I want to determine the upper bound of how many peaks should all the other methods handle without knowing in advance how many the first method will get (specially if sometimes select_peaks is non-deterministic). If the detect_peaks method produce less than 1000 peaks, then I am fine with the pipeline movin
So I think I agree with @alejoe91 that we should probably change the semantics. Select peaks should have an argument like max_n_peaks which would be the upper bound of what the sampling returns.
Summarizing the solution we discussed yesterday:
select_peaksshould take two optional arguments to constrain the # of peaks returned/sampled:max_n_peaks: ensures thatinput_peakshas no more thanmax_n_peaks, if it does --> subsample down tomax_n_peaksmin_n_peaks: ensures thatinput_peakshas no fewer thanmin_n_peaks, if it does --> throw error
- Potentially rename
select_peaksto something more informative based on its function of constraining the number of peaks to some specified range
Something I just thought of while writing this:
- different peak selection methods may want to handle the
min_n_peaksproblem differently, for example:- Sampling
input_peakswith replacement (bootstrapping) - Performing some data augmentation to generate more peaks (adding low levels of noise, amplitude jitter, etc.)
- Sampling
- So, we probably want to
assert input_peaks.size >= min_n_peaksin each method separately
The opinion of this humble outsider, if I may offer it - if you're changing this argument structure I would additionally vote for a more standardized naming scheme that also matches other usage across the repo
Current suggestion - Standardized suggestion
max_n_peaks - max_num_peaks
min_n_peaks - min_num_peaks
which is then more similar to calls such as
get_num_channels, get_num_units
@h-mayorquin, this is also over a year old at this point. Is this still on the todo list or should we close for inactivity?
I don't know what is the current status of detect_peaks after all this year refactoring. I might have to work on peak detection later this summer so I can take a look and come back to this. If not, and no one wants to work on this then we can close it.
Let's close. There's been a ton of activity on sortingcomponents and it might not be relevant anymore