NeuroKit
NeuroKit copied to clipboard
Adapt signal_fixpeaks to deal with larger gaps in data
Describe the solution you'd like I noticed that the "neurokit" method of signal_fixpeaks does not work well with larger gaps in the data, sometimes leading to negative indices or getting stuck in a loop: https://github.com/danibene/fixpeaks_with_large_gaps/blob/main/example_signal_with_large_gaps.ipynb
How could we do it? I think there are a few possible ways to go about dealing with larger gaps (not mutually exclusive):
- Adapting the number of NaNs depending on the interval size rather than always inserting 2 NaNs for large intervals
- Interpolating the peaks directly rather than the intervals
- Providing a warning if there are negative indices & breaking the loop after a certain number of iterations
If you're interested in any of the above, I'm happy to try my hand & open a PR!
Fixing peaks is a tricky issue, but there's definitely room for improvement! n°3 should be a starting point, then I don't know about what's best between 1 and 2... but please do open a PR so we can check.
Would be also nice to add an example / study explaining and benchmarking these methods
Fixing peaks is a tricky issue
Indeed!
I've opened the PR for now with the 3 changes I mentioned above, it's a work in progress but I thought I should get your input before overcomplicating things: https://github.com/neuropsychology/NeuroKit/pull/647
The new version is able to fix ECG peaks for data where the old version got stuck in a loop.
I still see a couple of issues though:
- there is no extrapolation in case there is a NaN at the final index (so when interpolating the peaks directly, the index stays constant rather than increasing) → modify the signal_interpolate function to extrapolate? → could remove NaNs at the end?
- sometimes the indices are not negative, but still out of order → provide warning for peak indices that are not increasing?
- if there are many missing peaks, it might be better to exclude them instead of linearly interpolating for certain analyses, e.g. short term HRV → have option to keep NaNs rather than interpolate above a certain threshold?
Hi @DominiqueMakowski & @JanCBrammer , I've noticed a couple of problematic parts of the code from the previous PR
- If the intervals are standardized, the calculation of the number of NaNs to insert based on the mean interval does not make sense: https://github.com/neuropsychology/NeuroKit/blob/4965f06ac5b89f2d026c21eff46282a44aa48dd6/neurokit2/signal/signal_fixpeaks.py#L645-L652 https://github.com/neuropsychology/NeuroKit/blob/4965f06ac5b89f2d026c21eff46282a44aa48dd6/neurokit2/signal/signal_fixpeaks.py#L673-L674 E.g.:
import neurokit2 as nk
peaks = [250, 1250, 1350, 2250, 3250]
interval = nk.signal_period(peaks, sampling_rate=1000, desired_length=None)
interval
array([0.75, 1. , 0.1 , 0.9 , 1. ])
nk.standardize(interval)
array([ 0. , 0.66226618, -1.72189206, 0.39735971, 0.66226618])
- The calculation of the number of NaNs to insert is always with the floor rather than rounding: https://github.com/neuropsychology/NeuroKit/blob/4965f06ac5b89f2d026c21eff46282a44aa48dd6/neurokit2/signal/signal_fixpeaks.py#L680-L681
This is probably not ideal if interval[loc] / mean_interval
is closer to the next integer:
int(1.99)
1
round(1.99)
2
Should I open a new PR? Sorry I didn't notice these earlier 🙈
Should I open a new PR? Sorry I didn't notice these earlier 🙈
No worries, better late than never :) Yes please do open a PR