NeuroKit icon indicating copy to clipboard operation
NeuroKit copied to clipboard

Adapt signal_fixpeaks to deal with larger gaps in data

Open danibene opened this issue 2 years ago • 4 comments

Describe the solution you'd like I noticed that the "neurokit" method of signal_fixpeaks does not work well with larger gaps in the data, sometimes leading to negative indices or getting stuck in a loop: https://github.com/danibene/fixpeaks_with_large_gaps/blob/main/example_signal_with_large_gaps.ipynb

How could we do it? I think there are a few possible ways to go about dealing with larger gaps (not mutually exclusive):

  1. Adapting the number of NaNs depending on the interval size rather than always inserting 2 NaNs for large intervals
  2. Interpolating the peaks directly rather than the intervals
  3. Providing a warning if there are negative indices & breaking the loop after a certain number of iterations

If you're interested in any of the above, I'm happy to try my hand & open a PR!

danibene avatar Jun 03 '22 09:06 danibene

Fixing peaks is a tricky issue, but there's definitely room for improvement! n°3 should be a starting point, then I don't know about what's best between 1 and 2... but please do open a PR so we can check.

Would be also nice to add an example / study explaining and benchmarking these methods

DominiqueMakowski avatar Jun 03 '22 09:06 DominiqueMakowski

Fixing peaks is a tricky issue

Indeed!

I've opened the PR for now with the 3 changes I mentioned above, it's a work in progress but I thought I should get your input before overcomplicating things: https://github.com/neuropsychology/NeuroKit/pull/647

The new version is able to fix ECG peaks for data where the old version got stuck in a loop.

I still see a couple of issues though:

  • there is no extrapolation in case there is a NaN at the final index (so when interpolating the peaks directly, the index stays constant rather than increasing) → modify the signal_interpolate function to extrapolate? → could remove NaNs at the end?
  • sometimes the indices are not negative, but still out of order → provide warning for peak indices that are not increasing?
  • if there are many missing peaks, it might be better to exclude them instead of linearly interpolating for certain analyses, e.g. short term HRV → have option to keep NaNs rather than interpolate above a certain threshold?

danibene avatar Jun 06 '22 07:06 danibene

Hi @DominiqueMakowski & @JanCBrammer , I've noticed a couple of problematic parts of the code from the previous PR

  1. If the intervals are standardized, the calculation of the number of NaNs to insert based on the mean interval does not make sense: https://github.com/neuropsychology/NeuroKit/blob/4965f06ac5b89f2d026c21eff46282a44aa48dd6/neurokit2/signal/signal_fixpeaks.py#L645-L652 https://github.com/neuropsychology/NeuroKit/blob/4965f06ac5b89f2d026c21eff46282a44aa48dd6/neurokit2/signal/signal_fixpeaks.py#L673-L674 E.g.:
import neurokit2 as nk
peaks = [250,  1250,  1350,  2250,  3250]
interval = nk.signal_period(peaks, sampling_rate=1000, desired_length=None)
interval
array([0.75, 1.  , 0.1 , 0.9 , 1.  ])

nk.standardize(interval)
array([ 0.        ,  0.66226618, -1.72189206,  0.39735971,  0.66226618])
  1. The calculation of the number of NaNs to insert is always with the floor rather than rounding: https://github.com/neuropsychology/NeuroKit/blob/4965f06ac5b89f2d026c21eff46282a44aa48dd6/neurokit2/signal/signal_fixpeaks.py#L680-L681

This is probably not ideal if interval[loc] / mean_interval is closer to the next integer:

int(1.99)
1

round(1.99)
2

Should I open a new PR? Sorry I didn't notice these earlier 🙈

danibene avatar Jul 19 '22 22:07 danibene

Should I open a new PR? Sorry I didn't notice these earlier 🙈

No worries, better late than never :) Yes please do open a PR

DominiqueMakowski avatar Jul 20 '22 00:07 DominiqueMakowski