antropy
antropy copied to clipboard
Different results of different SampEn implementations
My own implementation:
import math import numpy as np from scipy.spatial.distance import pdist
def sample_entropy(signal,m,r,dist_type='chebyshev', result = None, scale = None):
# Check Errors
if m > len(signal):
raise ValueError('Embedding dimension must be smaller than the signal length (m<N).')
if len(signal) != signal.size:
raise ValueError('The signal parameter must be a [Nx1] vector.')
if not isinstance(dist_type, str):
raise ValueError('Distance type must be a string.')
if dist_type not in ['braycurtis', 'canberra', 'chebyshev', 'cityblock',
'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis',
'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule']:
raise ValueError('Distance type unknown.')
# Useful parameters
N = len(signal)
sigma = np.std(signal)
templates_m = []
templates_m_plus_one = []
signal = np.squeeze(signal)
for i in range(N - m + 1):
templates_m.append(signal[i:i + m])
B = np.sum(pdist(templates_m, metric=dist_type) <= sigma * r)
if B == 0:
value = math.inf
else:
m += 1
for i in range(N - m + 1):
templates_m_plus_one.append(signal[i:i + m])
A = np.sum(pdist(templates_m_plus_one, metric=dist_type) <= sigma * r)
if A == 0:
value = math.inf
else:
A = A/len(templates_m_plus_one)
B = B/len(templates_m)
value = -np.log((A / B))
"""IF A = 0 or B = 0, SamEn would return an infinite value.
However, the lowest non-zero conditional probability that SampEn should
report is A/B = 2/[(N-m-1)*(N-m)]"""
if math.isinf(value):
"""Note: SampEn has the following limits:
- Lower bound: 0
- Upper bound : log(N-m) + log(N-m-1) - log(2)"""
value = -np.log(2/((N-m-1)*(N-m)))
if result is not None:
result[scale-1] = value
return value
signal= np.random.rand(200) // rand(200,1) in Matlab parameters: m = 1; r = 0.2
Outputs:
My implementation: 2.1812 Implementation adapted: 2.1969 Neurokit 2 entropy_sample function: 2.5316 Your implementation: 2.2431 Different implementation from GitHub: 1.0488
Hi @dmarcos97,
Thanks for this. It is quite worrying that all of the implementation return a different value. In AntroPy, the sample entropy is calculated using a custom Numba-accelerated function when x.size < 3000:
https://github.com/raphaelvallat/antropy/blob/275216aaeb226b7707bb235c3671ad335628e64a/antropy/entropy.py#L653-L658
Therefore, could you please also post the results with a longer time-series, e.g. np.random.rand(5000)
. Also, what random seed did you use for reproducibility?
Chiming in here as I'm inexorably drawn to all attempts to dive into and compare the solutions out there :)
@dmarcos97 the NK implementation is pretty modular (see main function), so do not hesitate to debug it to see at which level the discrepancy arises.
And make sure when comparing that the r-parameter is the same (as their default) might differ.
Hi @raphaelvallat and @DominiqueMakowski ,
Firstly, I've to apologise that last time I didn't use the same seed to generate the random signals. Now, with a 5000 sample random signal (np.random.seed(26)) and parameters (m = 2; r = 0.2), the results obtained are almost similar. Also, I noticed that some of the other tested implementations that returned a significantly different result (compare to our results) take the tolerance parameter as r * std(signal) instead of computing it from r parameter, so that's the bug, I guess.
Thank to both of you, I hope didn't waste your time. :)
You did not waste our time! For future reference, would you be able to add the output of all these functions for a 1000 and 5000 samples time-series here? Thanks!