antropy icon indicating copy to clipboard operation
antropy copied to clipboard

Different results of different SampEn implementations

Open dmarcos97 opened this issue 2 years ago • 4 comments

My own implementation:

import math import numpy as np from scipy.spatial.distance import pdist

def sample_entropy(signal,m,r,dist_type='chebyshev', result = None, scale = None):

# Check Errors
if m > len(signal):
    raise ValueError('Embedding dimension must be smaller than the signal length (m<N).')
if len(signal) != signal.size:
    raise ValueError('The signal parameter must be a [Nx1] vector.')
if not isinstance(dist_type, str):
    raise ValueError('Distance type must be a string.')
if dist_type not in ['braycurtis', 'canberra', 'chebyshev', 'cityblock',
                     'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
                     'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis',
                     'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
                     'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule']:
    raise ValueError('Distance type unknown.')

# Useful parameters
N = len(signal)
sigma = np.std(signal)
templates_m = []
templates_m_plus_one = []
signal = np.squeeze(signal)

for i in range(N - m + 1):
    templates_m.append(signal[i:i + m])

B = np.sum(pdist(templates_m, metric=dist_type) <= sigma * r)
if B == 0:
    value = math.inf
else:
    m += 1
    for i in range(N - m + 1):
        templates_m_plus_one.append(signal[i:i + m])
    A = np.sum(pdist(templates_m_plus_one, metric=dist_type) <= sigma * r)

    if A == 0:
        value = math.inf

    else:
        A = A/len(templates_m_plus_one)
        B = B/len(templates_m)

        value = -np.log((A / B))

"""IF A = 0 or B = 0, SamEn would return an infinite value. 
However, the lowest non-zero conditional probability that SampEn should
report is A/B = 2/[(N-m-1)*(N-m)]"""

if math.isinf(value):

    """Note: SampEn has the following limits:
            - Lower bound: 0 
            - Upper bound : log(N-m) + log(N-m-1) - log(2)"""

    value = -np.log(2/((N-m-1)*(N-m)))

if result is not None:
    result[scale-1] = value

return value

signal= np.random.rand(200) // rand(200,1) in Matlab parameters: m = 1; r = 0.2


Outputs:

My implementation: 2.1812 Implementation adapted: 2.1969 Neurokit 2 entropy_sample function: 2.5316 Your implementation: 2.2431 Different implementation from GitHub: 1.0488

dmarcos97 avatar Oct 04 '21 07:10 dmarcos97

Hi @dmarcos97,

Thanks for this. It is quite worrying that all of the implementation return a different value. In AntroPy, the sample entropy is calculated using a custom Numba-accelerated function when x.size < 3000:

https://github.com/raphaelvallat/antropy/blob/275216aaeb226b7707bb235c3671ad335628e64a/antropy/entropy.py#L653-L658

Therefore, could you please also post the results with a longer time-series, e.g. np.random.rand(5000). Also, what random seed did you use for reproducibility?

raphaelvallat avatar Oct 04 '21 20:10 raphaelvallat

Chiming in here as I'm inexorably drawn to all attempts to dive into and compare the solutions out there :)

@dmarcos97 the NK implementation is pretty modular (see main function), so do not hesitate to debug it to see at which level the discrepancy arises.

And make sure when comparing that the r-parameter is the same (as their default) might differ.

DominiqueMakowski avatar Oct 04 '21 23:10 DominiqueMakowski

Hi @raphaelvallat and @DominiqueMakowski ,

Firstly, I've to apologise that last time I didn't use the same seed to generate the random signals. Now, with a 5000 sample random signal (np.random.seed(26)) and parameters (m = 2; r = 0.2), the results obtained are almost similar. Also, I noticed that some of the other tested implementations that returned a significantly different result (compare to our results) take the tolerance parameter as r * std(signal) instead of computing it from r parameter, so that's the bug, I guess.

Thank to both of you, I hope didn't waste your time. :)

dmarcos97 avatar Oct 05 '21 17:10 dmarcos97

You did not waste our time! For future reference, would you be able to add the output of all these functions for a 1000 and 5000 samples time-series here? Thanks!

raphaelvallat avatar Oct 06 '21 19:10 raphaelvallat