NeuroKit icon indicating copy to clipboard operation
NeuroKit copied to clipboard

hrv function: ValueError: Expect x to not have duplicates

Open martinfrasch opened this issue 2 years ago • 22 comments

When calling nk.hrv on my RRI data, I get

ValueError: Expect x to not have duplicates

The RRI data appears to satisfy the criteria of the HRV function. It looks like so 443, 443, 443, 446, 438, 437,

The complete error log is as follows:


ValueError Traceback (most recent call last) in () ----> 1 hrv_welch=nk.hrv_frequency(data,sampling_rate=1000,show=True,psd_method="welch")

~/anaconda3/lib/python3.6/site-packages/neurokit2/hrv/hrv_frequency.py in hrv_frequency(peaks, sampling_rate, ulf, vlf, lf, hf, vhf, psd_method, show, silent, normalize, order_criteria, **kwargs) 125 126 # Compute R-R intervals (also referred to as NN) in milliseconds (interpolated at 1000 Hz by default) --> 127 rri, sampling_rate = _hrv_get_rri(peaks, sampling_rate=sampling_rate, interpolate=True, **kwargs) 128 129 frequency_band = [ulf, vlf, lf, hf, vhf]

~/anaconda3/lib/python3.6/site-packages/neurokit2/hrv/hrv_utils.py in _hrv_get_rri(peaks, sampling_rate, interpolate, **kwargs) 26 rri, 27 x_new=np.arange(desired_length), ---> 28 **kwargs 29 ) 30 return rri, sampling_rate

~/anaconda3/lib/python3.6/site-packages/neurokit2/signal/signal_interpolate.py in signal_interpolate(x_values, y_values, x_new, method) 75 else: 76 interpolation_function = scipy.interpolate.interp1d( ---> 77 x_values, y_values, kind=method, bounds_error=False, fill_value=([y_values[0]], [y_values[-1]]) 78 ) 79

~/anaconda3/lib/python3.6/site-packages/scipy/interpolate/interpolate.py in init(failed resolving arguments) 527 528 self._spline = make_interp_spline(xx, yy, k=order, --> 529 check_finite=False) 530 if rewrite_nan: 531 self._call = self.class._call_nan_spline

~/anaconda3/lib/python3.6/site-packages/scipy/interpolate/_bsplines.py in make_interp_spline(x, y, k, t, bc_type, axis, check_finite) 787 raise ValueError("Expect x to be a 1-D sorted array_like.") 788 if np.any(x[1:] == x[:-1]): --> 789 raise ValueError("Expect x to not have duplicates") 790 if k < 0: 791 raise ValueError("Expect non-negative k.")

ValueError: Expect x to not have duplicates

System Specifications

  • OS: Linux ( 64bit)

  • Python: 3.6.8

  • NeuroKit2: 0.1.3

  • NumPy: 1.18.4

  • Pandas: 1.0.3

  • SciPy: 1.5.4

  • sklearn: 0.22.2.post1

  • matplotlib: 3.1.1

martinfrasch avatar Aug 13 '21 06:08 martinfrasch

Hi 👋 Thanks for reaching out and opening your first issue here! We'll try to come back to you as soon as possible. ❤️ kenobi

welcome[bot] avatar Aug 13 '21 06:08 welcome[bot]

I narrowed it down to being due to hrv_frequency function. Still not clear why. The other HRV metrics all compute fine. Appreciate any sugggestions on how to make PSD estimates work as well. Thank you!

martinfrasch avatar Aug 13 '21 07:08 martinfrasch

Hi @martinfrasch the error occurs because all the hrv functions contain steps to convert the default input of peaks into RRI (you can see in your error traceback there is an internal function called _hrv_get_rri()). Even if no error occurs with the other domains like hrv_time, your results will still be erroneous because of this step. If you don't have your peaks data, there are two ways around this:

  1. clone this repo and modify the hrv functions yourself by commenting out the initial few steps that involving sanitizing the input, for example like so in the first few lines of hrv_time so that the function works directly on your rri data
    # Sanitize input
    # peaks = _hrv_sanitize_input(peaks)
    # if isinstance(peaks, tuple):  # Detect actual sampling rate
    #    peaks, sampling_rate = peaks[0], peaks[1]

    # Compute R-R intervals (also referred to as NN) in milliseconds
    # rri = _hrv_get_rri(peaks, sampling_rate=sampling_rate, interpolate=False)
    diff_rri = np.diff(rri)

    out = {}  # Initialize empty container for results


or 2) you can convert your RRI data into peaks, also talked about in this issue here!

zen-juen avatar Aug 16 '21 08:08 zen-juen

Thanks so much for the quick response!!

My data is RRI in milliseconds uniformly resampled to 4 Hz (from raw 1000 Hz RRIs).

While the joint call didn't work, I am able to compute all metrics separately using signal, time and nonlinear functions.

So I suppose it's all good save my worry about your point that even if the calculations work, they will be incorrect. Why is that?

On Mon, Aug 16, 2021, 1:21 AM Zen @.***> wrote:

Hi @martinfrasch https://github.com/martinfrasch the error occurs because all the hrv functions contain steps to convert the default input of peaks into RRI (you can see in your error traceback there is an internal function called _hrv_get_rri()). Even if no error occurs with the other domains like hrv_time, your results will still be erroneous because of this step. If you don't have your peaks data, there are two ways around this:

  1. clone this repo and modify the hrv functions yourself by commenting out the initial few steps that involving sanitizing the input, for example like so in the first few lines of hrv_time so that the function works directly on your rri data
# Sanitize input
# peaks = _hrv_sanitize_input(peaks)
# if isinstance(peaks, tuple):  # Detect actual sampling rate
#    peaks, sampling_rate = peaks[0], peaks[1]

# Compute R-R intervals (also referred to as NN) in milliseconds
# rri = _hrv_get_rri(peaks, sampling_rate=sampling_rate, interpolate=False)
diff_rri = np.diff(rri)

out = {}  # Initialize empty container for results

or 2) you can convert your RRI data into peaks, also talked about in this issue https://github.com/neuropsychology/NeuroKit/issues/433#issuecomment-811067293 here!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neuropsychology/NeuroKit/issues/491#issuecomment-899319659, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEAAVYJ3YQFYEUZJBJBSZLT5DC7XANCNFSM5CC62O4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

martinfrasch avatar Aug 16 '21 18:08 martinfrasch

That's great! I'm assuming you commented out the first few lines of each function then?

Since each function was designed to take peaks samples, the step of converting them into RRI is done here and so if you had purely input RRI into the hrv functions, the functions will wrongly treat your RRI data as if they were the peak samples and pass it through the above step!

zen-juen avatar Aug 17 '21 01:08 zen-juen

Thanks a lot!

May I clarify: by peak samples you mean raw beat to beat intervals in milliseconds as they come from R peak detection (prior to any equidistant resampling)?

The "success" I referred to was from using equidistantly resampling RRIs without any # commenting on those lines of code.

The way I understood the API was that the functions wanted my resampled RRIs and the sampling rate.

Based on that I am grateful you highlighted this source of error in my approach before I committed to a larger batch run!

With best regards, Martin

On Mon, Aug 16, 2021, 6:19 PM Zen @.***> wrote:

That's great! I'm assuming you commented out the first few lines of each function then?

Since each function was designed to take peaks samples, the step of converting them into RRI is done here https://github.com/neuropsychology/NeuroKit/blob/a4eb198b75cd343a35a02af41f6bc32808442585/neurokit2/hrv/hrv_utils.py#L10 and so if you had purely input RRI into the hrv functions, the functions will wrongly treat your RRI data as if they were the peak samples and pass it through the above step!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neuropsychology/NeuroKit/issues/491#issuecomment-899922039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEAAV23HSZ7OX4TJIBEBKLT5G2KHANCNFSM5CC62O4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

martinfrasch avatar Aug 17 '21 02:08 martinfrasch

Oh no I was referring to the samples at which the R-peaks occur, like so 😄

data = nk.data("bio_resting_5min_100hz") # import data
peaks, info = nk.ecg_peaks(data["ECG"], sampling_rate=100) # find peaks

info
{'ECG_R_Peaks': array([   49,   107,   167,   227,   286,   347,   408,   471,   539,
          613,   685,   750,   814,   879,   947,  1012,  1075,  1140,
         1208,  1277,  1341,  1404,  1470,  1539,  1611,  1677,  1742,
         1808,  1876,  1945,  2010,  2073,  2137,  2204,  2274,  2342,
         2408,  2478,  2555,  2633,  2702,  2767,  2834,  2903,  2972,
         3038,  3102,  3169,  3243,  3322,  3399,  3467,  3537,  3609,
...}

The hrv functions can then take either the info or peaks input. Sorry for the confusion! Right now, neurokit doesn't support direct RRI input for ecg/hrv related functions but that being said, it might be a good idea to have some utility functions for quick rri-to-peak conversion @DominiqueMakowski 🤔 at the moment, it'll probably be most convenient to comment out those lines of code and run the specific hrv domain functions on your RRI data! feel free to give a shout if you encounter any other issues when you commit to a larger batch run 😄

zen-juen avatar Aug 17 '21 02:08 zen-juen

Thanks so much! That answers my question.

It would help me to know how exactly you proceed with equidistant sampling from the time peaks input. In Matlab, I control that important step explicitly, but the convenience of letting NK do it all is enticing too. Knowing the internal logic would be super helpful.

On Mon, Aug 16, 2021, 7:19 PM Zen @.***> wrote:

Oh no I was referring to the samples at which the R-peaks occur, like so 😄

data = nk.data("bio_resting_5min_100hz") # import data

peaks, info = nk.ecg_peaks(data["ECG"], sampling_rate=100) # find peaks

info

{'ECG_R_Peaks': array([ 49, 107, 167, 227, 286, 347, 408, 471, 539,

      613,   685,   750,   814,   879,   947,  1012,  1075,  1140,

     1208,  1277,  1341,  1404,  1470,  1539,  1611,  1677,  1742,

     1808,  1876,  1945,  2010,  2073,  2137,  2204,  2274,  2342,

     2408,  2478,  2555,  2633,  2702,  2767,  2834,  2903,  2972,

     3038,  3102,  3169,  3243,  3322,  3399,  3467,  3537,  3609,

...}

The hrv functions can then take either the info or peaks input. Sorry for the confusion!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neuropsychology/NeuroKit/issues/491#issuecomment-899942691, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEAAV4P5UPYFTY6PMSKXV3T5HBMZANCNFSM5CC62O4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

martinfrasch avatar Aug 17 '21 04:08 martinfrasch

I think signal_resample() in neurokit could do the trick, perhaps you can check out our documentation!

https://github.com/neuropsychology/NeuroKit/blob/a4eb198b75cd343a35a02af41f6bc32808442585/neurokit2/signal/signal_resample.py#L8-L16

zen-juen avatar Aug 17 '21 04:08 zen-juen

Thanks so much! I was away for a bit, hence delayed responses here. Your help is hugely appreciated!!!

FYI, the link https://github.com/neuropsychology/NeuroKit/scripts/resampling.ipynb does not work.

May I propose that you add an error message if someone is feeding RRIs instead of peak intervals into the nk? I think this will help prevent erroneous computation of HRV metrics (because, as I shared above, I can actually compute them with the RRIs data using the Neurokit's individual HRV domain functions with no errors whatsoever).

martinfrasch avatar Sep 02 '21 20:09 martinfrasch

No worries @martinfrasch!

FYI, the link https://github.com/neuropsychology/NeuroKit/scripts/resampling.ipynb does not work.

What scripts are you looking for exactly?

May I propose that you add an error message if someone is feeding RRIs instead of peak intervals into the nk? I think this will help prevent erroneous computation of HRV metrics (because, as I shared above, I can actually compute them with the RRIs data using the Neurokit's individual HRV domain functions with no errors whatsoever).

Thanks for the suggestion! Imo there is no clear-cut/obvious way of distinguishing rri from peaks if they both have the same input structure, and our docstrings do also explicitly specify that functions require peak input. Another option perhaps may be to have an rri=False argument in the function, so that users have the option to pass rri data instead of peaks if have only the former - but I'm not convinced that this is the most elegant solution 😄

zen-juen avatar Sep 03 '21 03:09 zen-juen

Thank you!

Regarding the notebook: I am just hoping to see how exactly you go about converting peak data into RRIs (steps like resampling, interpolation, artifact correction (if any)). Related, some of our bivariate RRIs data have different lengths [quality of data varies] and we are looking at relationships between these time series, so we need to handle different lengths individually to avoid downstream errors. Basically, for bivariate analyses, I cut it down to the shortest of the two for each individual bivariate pair.

Regarding the suggestion to flag an error if RRIs are fed instead of peak data, my thought was that peaks time series are always increasing in value, but RRIs are not [they may accidentally do so from time to time but not consistently], so if the input does not detect such increase in each case, say between [n+1] - [n], an error would be raised.

Having the option to feed RRIs would be neat, from my perspective. Many of my datasets have that structure.

martinfrasch avatar Sep 03 '21 03:09 martinfrasch

clear-cut/obvious way of distinguishing rri from peaks

we could check whether the vector is ordered (300, 500, 700) of not (700, 300, 500), if not, then it's probably not peaks

DominiqueMakowski avatar Sep 03 '21 05:09 DominiqueMakowski

I just saw it's exactly what @martinfrasch said 😅 should have read all before writing

DominiqueMakowski avatar Sep 03 '21 05:09 DominiqueMakowski

Hi again! I was testing using peak data as input as you recommended above. My peak data looks something like this: array([ 509, 1123, 1738, ..., 0, 0, 0], dtype=int32) Note that the numpy array does contain some trailing zeros, an artifact of how the raw data was saved when extracted from ECG channels (we use one of the several channels and some produced now peak data at the end). Reading this as peaks = nk.ecg_peaks(peaks, sampling_rate=1000) leads to an empty array like so

(      ECG_R_Peaks
 0               0
 1               0
 2               0
 3               0
 4               0
 ...           ...
 4403            0
 4404            0
 4405            0
 4406            0
 4407            0
 
 [4408 rows x 1 columns],
 {'ECG_R_Peaks': array([], dtype=int64), 'sampling_rate': 1000})

Do you see what I may be overlooking that explains this error? Is it because the function expects a structured dictionary rather than a plain numpy array input?

I removed trailing zeros using numpy.trim_zeros and converted the array to dataframe feeding that into nk.hrv. This worked! The HRV metrics seem to make sense physiologically too.

  • Thank you!

martinfrasch avatar Sep 03 '21 23:09 martinfrasch

Hi @martinfrasch I may be misunderstanding your goal here but just to clarify, hrv functions take peaks data as the default input, whereas the purpose of ecg_peaks() is to locate the R-peaks in the ECG signal using the input of a cleaned ECG signal - so these two functions are qualitatively different 😄 Here is how an example pipeline looks like:

signal = nk.data("bio_resting_5min_100hz")["ECG"] # get ecg signal
cleaned = nk.ecg_clean(signal, sampling_rate=100)  # clean
peaks, info = nk.ecg_peaks(cleaned, sampling_rate=100, correct_artifacts=True) # generate peaks
hrv_indices = nk.hrv(peaks, sampling_rate=100, show=True) # compute hrv

So if you already have peaks data, you won't need to pass them through ecg_peaks(). Are you trying to fix the peaks data in this case? If you are, you can use signal_fixpeaks() instead. Do refer to our documentation too!

Regarding the suggestion to flag an error if RRIs are fed instead of peak data, my thought was that peaks time series are always increasing in value, but RRIs are not [they may accidentally do so from time to time but not consistently], so if the input does not detect such increase in each case, say between [n+1] - [n], an error would be raised.

Regarding this, would we want to a) throw a warning and then compute them anyway assuming they are rri data, OR b) throw an error? @DominiqueMakowski 🤔 I agree with this, though I was also thinking that false negatives may occur in the (potentially rare) situation where the RRI input fed is continuously increasing (if say, there are not a lot of data points).

zen-juen avatar Sep 04 '21 04:09 zen-juen

that false negatives may occur in the (potentially rare) situation where the RRI input fed is continuously increasing

Yeah that's a possibility, but then I'd say it's okay, we still catch most of the cases where RRIs are mistakenly passed as peaks. And if the RRis that are all consecutive, well then too bad, it's like what if people pass mistakenly provide an RSP signal instead of ECG, we cannot catch all the user-errors either ^^

OR b) throw an error?

I'd say we throw an error "The peaks indices passed to nk.hrv were detected as non-consecutive. You might have passed RR intervals instead of peaks, in which case you need to convert them using ..." (suggesting to use the rri -> peaks idx converted function)

DominiqueMakowski avatar Sep 04 '21 05:09 DominiqueMakowski

@zen-juen Thank you very much. This all helps clarify the workflow. I am still having some issues with getting the peak data input to work correctly.

  1. I played with fixpeaks(): it does not finish at 1000 Hz, so I had to halt it after ~60min [~40 min of peaks data sampled at 1000 Hz).
  2. For computing HRV metrics, do you internally downsample to say 4 Hz or should I actively re-sample peaks first prior to feeding the data to nk.hrv()?
  3. I understand that there is no internal resampling in hrv(), so I looked into signal_resample: from what I understood the expected input is not peaks but a time series such as RRI, so it would not work with the above HRV preprocessing workflow. Is this correct? Is there a way to fix that within the "peaks" workflow?

In summary, I can now go straight from raw peaks into HRV but this misses the preprocessing with artifact correction which is not a good pipeline for HRV analysis. It would be great to get the steps of "downsampling" and "fixpeaks" to work (ideally in this order for computational efficiency).

martinfrasch avatar Sep 14 '21 02:09 martinfrasch

Hi @martinfrasch, I'm a little confused here regarding your pipeline:

My data is RRI in milliseconds uniformly resampled to 4 Hz (from raw 1000 Hz RRIs).

May I ask how did you get your current peaks data from RRI again? I thought you had obtained the peaks from the 4Hz-resampled RRI (which would mean that the peaks are also already re-sampled) but do correct me if I'm mistaken

zen-juen avatar Sep 15 '21 03:09 zen-juen

In general, I would that if you have raw peak data it doesn't make much sense to try to correct them anyway with fancy methods. Most of the errors come from peak detection, but you don't have access to this step, so the only thing you can do is eventually look for outliers or implausible values of RRi, but then again from what you said it seems like you have a signal that is long enough to ensure some amount of robustness when it comes to HRV.

DominiqueMakowski avatar Sep 15 '21 04:09 DominiqueMakowski

May I ask how did you get your current peaks data from RRI again?

My "raw" data are peaks (this comes from raw ECG). With you advice, I now have the pipeline working well. I added customized segmentation into 5 minutes as my recordings are about 40 min each. You have an epoche feature, but I was not sure how to use it for n epochs within a recording.

Originally, as I was not fully aware of all NK functionality, I converted the peak data into RRIs and resampled them equidistantly at 4 Hz, all in Matlab. I love that I no longer need to rely on Matlab for this step!

I am using your peaks to rri function to get the RRIs now which allows me to use the other nk functions which take RRIs instead of peaks. I can't thank you both enough, @zen-juen and @DominiqueMakowski for your patient and quick answers to get me up and running!

I am adding optimal time delay to the HRV metrics output as well as some additional complexity metrics that NK computes but did not so far add to the _hrv output. Namely:

  • fuzzy entropy metrics: complexity_fuzzymse() and complexity_fuzzyrcmse().
  • cApEn
  • delay_optimal() and complexity_delay [the latter I cannot get to work using optimal time delay as input as per example in NK manual]

I also suggest that you consider the JIDT package and its powerful univariate and bivariate (or even conditional, trivariate!) information-theoretical metrics which all can be integrated with Python via JVM calls.

martinfrasch avatar Sep 15 '21 19:09 martinfrasch

This issue has been automatically marked as inactive because it has not had recent activity. It will eventually be closed if no further activity occurs.

stale[bot] avatar Mar 15 '22 01:03 stale[bot]

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Sep 20 '22 19:09 stale[bot]

I think the bulk of this issue has been addressed but if not feel free to re-open!

DominiqueMakowski avatar Sep 21 '22 00:09 DominiqueMakowski

Hi! I followed everything but the issue is not being resolved! I am trying to get hrv_frequency data from heart rate, which is obtained by Biopac (ACQ file). Here is a little code snippet: `df = neurokit2.read_acqknowledge('acqfile_name.acq') df = list(df) zeroth = df[0] pd.set_option('display.max_columns', None)

heart_rate = zeroth.iloc[:,9] ibi = [] for hr in heart_rate: if hr!=0: rr = 6000/hr else: rr = 0 ibi.append(rr) diff_rri = np.diff(ibi) peaks = np.cumsum(np.append[0],diff_rri) new_peak = neurokit2.intervals_to_peaks(peaks)

hrv = neurokit2.hrv_frequency(new_peak,sampling_rate=1000) print(hrv)`

The error is ValueError: Expect x to not have duplicates Will anyone be able to help on this?

RiaBanerjee24 avatar May 16 '23 06:05 RiaBanerjee24