NeuroKit icon indicating copy to clipboard operation
NeuroKit copied to clipboard

Interpolation in hrv_frequency()

Open danibene opened this issue 1 year ago • 14 comments

Question and context

Hello!

I would like to try interpolating R-R intervals at different sampling frequencies to see how that affects the frequency-domain HRV features, but I am confused about how the R-R intervals are interpolated in hrv_frequency(): https://github.com/neuropsychology/NeuroKit/blob/277f614ee3984647d71ba31aadffea97766aeb0f/neurokit2/hrv/hrv_frequency.py#L162-L165

Looking at _hrv_get_rri(), I am unsure whether sampling_rate is the appropriate variable to change here:

https://github.com/neuropsychology/NeuroKit/blob/277f614ee3984647d71ba31aadffea97766aeb0f/neurokit2/hrv/hrv_utils.py#L8-L29

My questions:

  1. Where is the sampling frequency of the R-R interval interpolation set? It seems to me like desired_length will be the same regardless of the value of sampling_rate.
  2. Why is the minimum sampling frequency set to 10 Hz? I was wondering because I've seen lower values in the literature for interpolation of R-R intervals (e.g. 4 Hz) - if I understand correctly that sampling_rate is referring to the frequency at which the R-R intervals should be interpolated.

Thank you :)

danibene avatar Jul 07 '22 01:07 danibene

Hey Dani,

  1. I am not entirely sure (tagging @Tam-Pham). The thing is that we need to know the sampling rate regardless of the interpolation purposes to make sense of the values. For instance, RRis that [2000, 1500, 1750] and [100, 75, 87.5] might be the same (the first one is differences at 1000Hz, the second one at 50Hz), so we need to put them on the same scale (the y-axis). Then, the rate of interpolation is something different. I'm not sure it helps but that's what it makes me think
  2. I'm not sure either. The minimum sampling rate might have to do with the power spectrum that is computed, since too coarse signals might lead to errors there. I saw 4 Hz for sampled heart rates (some "smart"watch & wearables only provide that), but that's definitely too low and too coarse for PSD estimation, better to resample to higher freqs.

Let us know your thoughts

DominiqueMakowski avatar Jul 07 '22 02:07 DominiqueMakowski

In my opinion, it would make more sense to set a default sampling frequency for the interbeat interval interpolation independent of the sampling frequency of the raw heartbeat signal, for more reproducibility independent of the recording equipment.

So I'd probably change the function to add a separate argument for the rate used for interpolation and only use the original sampling frequency for computing the peak times from the sample indices?

Also, I checked what Kubios does & they use 4 Hz (in addition to doing other things that Neurokit doesn't as far as I know, like detrending, so results wouldn't be directly comparable anyway): see page 31 of their user guide

danibene avatar Jul 07 '22 02:07 danibene

We could add detrending too though, unless there are reasons to not have it (or to have it as an argument)

So I'd probably change the function to add a separate argument for the rate used for interpolation and only use the original sampling frequency for computing the peak times from the sample indices?

@JanCBrammer @Tam-Pham what do you think?

DominiqueMakowski avatar Jul 07 '22 03:07 DominiqueMakowski

We could add detrending too though

That's true!

Also, in case helpful here is a notebook showing the power for different interpolation rates: https://github.com/danibene/hrv_freq_interpl/blob/main/neurokit_hrv_freq_interpl.ipynb

danibene avatar Jul 07 '22 03:07 danibene

image

Interpolated looks better though, even though it can lead to small (probable) artefacts (first arrow), overall, it distorts less the smooth dynamics of hear rate, you can see that the non-interpolated signal (second arrow) creates distorted peaks due to the low SR. The PSD looks also better at 1000Hz as it doesn't have this super strong low-frequency component (that is magnified by the low SR)

DominiqueMakowski avatar Jul 07 '22 03:07 DominiqueMakowski

Honestly, I'm not sure how to evaluate what is better here - I've seen a lot of variability in the literature (including not interpolating/resampling at all and using the original timestamps in the lomb-scargle periodogram). But I think even if we don't know the ideal method, it would still make sense to set a default that is independent of the raw data sampling rate. Interested in others' thoughts though!

danibene avatar Jul 07 '22 13:07 danibene

Hi all, apologize for my very late reply!

Before we discuss, to avoid the confusion, I strongly recommend changing the sampling_rate here to interpolation_rate; so that we know the interpolrate_rate is the rate at which rri is going to be interpolated and sampling_rate is the original rate at which the signal was sampled. https://github.com/neuropsychology/NeuroKit/blob/277f614ee3984647d71ba31aadffea97766aeb0f/neurokit2/hrv/hrv_utils.py#L18

Thank you @danibene for looking through the code. I don't think I was the original writer for this function. But looking at the code now, I do agree that the desired_length is pretty much unaffected by the sampling_rate (intended to be used as interpolation rate). In fact, from the way the code is written, by using the peak samples, the interpolation rate is default to the original sampling rate of the signal.

In order to effectively interpolate the rri using a specific interpolation rate, I do agree with this suggestion:

change the function to add a separate argument for the rate used for interpolation

x_new for the interpolation can probably be

x_new = np.arrange(1, desired_length, 1/interpolation_rate)

Tam-Pham avatar Jul 20 '22 14:07 Tam-Pham

Hi @Tam-Pham , thanks for the feedback! I opened a PR with your suggested changes: https://github.com/neuropsychology/NeuroKit/pull/680

I defined x_new a bit differently than you suggested:

x_new = np.arrange(1, desired_length, 1/interpolation_rate) vs. x_new = np.arange(start=peaks[1], stop=peaks[-1] + 1 / interpolation_rate, step=1 / interpolation_rate)

The reason was so that the first and last indices of x_new would correspond to the first and last indices of the original x-values of the interbeat intervals, e.g.:


sampling_rate=200
interpolation_rate=10
peaks = np.arange(stop=1000, step=sampling_rate)
peaks[1:] # Skip first peak since it has no corresponding element in heart_period
array([200, 400, 600, 800])

desired_length = int(np.rint(peaks[-1]))
x_new = np.arange(1, desired_length, 1 / interpolation_rate)
x_new
array([  1. ,   1.1,   1.2, ..., 799.7, 799.8, 799.9])

x_new = np.arange(start=peaks[1], stop=peaks[-1] + 1 / interpolation_rate, step=1 / interpolation_rate)
x_new
array([200. , 200.1, 200.2, ..., 799.8, 799.9, 800. ])

Let me know if you have any questions about this!

Now my questions for you & @DominiqueMakowski :

  1. Should we keep the default interpolation_rate at 1000 Hz despite it being high compared to most of the literature?
  2. Should we enforce a minimum interpolation_rate, and if so, should it be as high as 10 Hz? (Open to any of your suggestions, but I would argue if we enforce it at all, the minimum should be 1 Hz because of the Nyquist rate...for high-frequency HRV between 0.15 and 0.4 Hz we don't need more than 1 Hz)
  3. Should we also add an option to not interpolate for the (the lomb-scargle periodogram)? (And if so, should this be a separate PR?)

danibene avatar Jul 20 '22 18:07 danibene

Hi @danibene

I defined x_new a bit differently than you suggested:

oh no it's entirely my bad (please excuse my midnight brain). Given that we are interpolating the rri, first value of the x_new should definitely be the 2nd peak and the last be the last peak.

We could add detrending too though

Currently, there is a simple constant detrending (by mean) in signal_psd https://github.com/neuropsychology/NeuroKit/blob/277f614ee3984647d71ba31aadffea97766aeb0f/neurokit2/signal/signal_psd.py#L128

I think Kubios implemented a more complicated detrend method than that.

Should we keep the default interpolation_rate at 1000 Hz despite it being high compared to most of the literature?

I understand that 4 Hz is the "gold standard" that is most common in the literature. TBH, given the range of frequencies that we are looking at for HRV indices, I would say 4 Hz is a reasonable choice - for the sake of standardization with other packages too.

Should we enforce a minimum interpolation_rate, and if so, should it be as high as 10 Hz? (Open to any of your suggestions, but I would argue if we enforce it at all, the minimum should be 1 Hz because of the Nyquist rate...for high-frequency HRV between 0.15 and 0.4 Hz we don't need more than 1 Hz)

I can't remember/ don't know why the minimum rate was set at 10Hz. I vaguely remember it was for the sake of Nyquist so I would say we should definitely enforce it. Again, given the frequencies that we are interested in, I agree that 1 Hz is sufficient.

Should we also add an option to not interpolate for the the lomb-scargle periodogram? (And if so, should this be a separate PR?)

Off the top of my head, I can't think of an elegant way to do this, given the way the package is structured right now. So definitely a separate PR. Thank you in advance for contributing to the package, @danibene. It needs the constant love and improvement.

P/S I saw the error in the tests at #680. It looks like a time out error or something that causes the test to terminate. It's a strange error. I will run the code locally to see what's going on. Do give me a few days.

Tam-Pham avatar Jul 21 '22 14:07 Tam-Pham

@Tam-Pham

(please excuse my midnight brain)

Relatable :D

Re detrending:

I think Kubios implemented a more complicated detrend method than that.

The Kubios user guide (pg. 12) references the same paper as signal_detrend(): https://github.com/neuropsychology/NeuroKit/blob/277f614ee3984647d71ba31aadffea97766aeb0f/neurokit2/signal/signal_detrend.py#L111-L113

In case we'd eventually like to have a processing pipeline aiming to replicate all the Kubios defaults, I think detrending is not just limited to frequency-domain analysis:

  • From the user guide (pg. 17): "All HRV parameters are calculated from the detrended RR interval data (if detrending is applied), but mean RR, mean HR as well as min and max HR values are exceptions (marked with the ∗ symbol)."
  • From my experience (a bit of a tangent, sorry): A while ago I was trying to figure out why RMSSD was not perfectly correlated with SD1 when I had really short segments, and I got different results with the outputs from Neurokit vs. Kubios. If I remember correctly, disabling detrending in the Kubios settings gave me more similar results to Neurokit.

Re interpolation:

I understand that 4 Hz is the "gold standard" that is most common in the literature. TBH, given the range of frequencies that we are looking at for HRV indices, I would say 4 Hz is a reasonable choice - for the sake of standardization with other packages too.

Will change that then, thanks for the feedback.

I can't remember/ don't know why the minimum rate was set at 10Hz. I vaguely remember it was for the sake of Nyquist so I would say we should definitely enforce it. Again, given the frequencies that we are interested in, I agree that 1 Hz is sufficient.

If you don't mind, I think I would prefer to have a warning rather than for the rate to be silently changed. I'll just add a warning for < 1 Hz for now, but feel free to comment in the PR if you'd prefer otherwise.

P/S I saw the error in the tests at https://github.com/neuropsychology/NeuroKit/pull/680. It looks like a time out error or something that causes the test to terminate. It's a strange error. I will run the code locally to see what's going on. Do give me a few days.

No rush from my side, thanks so much :)

danibene avatar Jul 22 '22 02:07 danibene

PS:

Should we also add an option to not interpolate for the the lomb-scargle periodogram? (And if so, should this be a separate PR?)

Off the top of my head, I can't think of an elegant way to do this, given the way the package is structured right now. So definitely a separate PR.

I will hold off on a new PR for now since the timepoints corresponding to the non-interpolated interbeat intervals would need to be passed to or computed within _signal_psd_lomb: https://github.com/neuropsychology/NeuroKit/blob/a349eb40fdb666b974468e5d31890284fd9bc759/neurokit2/signal/signal_psd.py#L305

And I am unsure if that should be done in a minimal way specific to _signal_psd_lomb or as a larger change to the structure; opened a separate issue for including the timestamps of the interbeat intervals as an input: https://github.com/neuropsychology/NeuroKit/issues/684

danibene avatar Jul 22 '22 03:07 danibene

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Sep 20 '22 19:09 stale[bot]

Same with this?

DominiqueMakowski avatar Sep 20 '22 23:09 DominiqueMakowski

@DominiqueMakowski I still would like to make it possible to not interpolate the RRI when using the lomb-scargle periodogram method, but will address after https://github.com/neuropsychology/NeuroKit/issues/684

danibene avatar Sep 21 '22 01:09 danibene