entropy
entropy copied to clipboard
Removing a vector for approximate entropy
Hi @raphaelvallat I have been finding your packages and your guides extremely helpful!
I'm currently working on NeuroKit with @DominiqueMakowski and we are looking at implementing the functions for different entropy. I have a small question regarding your implementation below of ApEn:
https://github.com/raphaelvallat/entropy/blob/77ba71e5039ae2cc4e7876cd1803db20251b36ad/entropy/entropy.py#L254
if approximate:
emb_data1 = _emb_data1
else:
emb_data1 = _emb_data1[:-1]
It seems like here the last vector in the embedded time-series is removed if approximate is False (sample entropy). However, I couldn't find the rationale for this particular removal. Would really appreciate it if you could point me to the right direction.
Many thanks! Tam
Hi @Tam-Pham!
Thanks for the feedback and for opening this issue!
The implementations of the approximate and sample entropy are simply 1D adaptations of a code from the MNE-features package and since I did not write the original code I'm not sure to understand why the last embedded vector is removed here (also, I worked on these functions almost two years ago and I have a very bad memory...š¬).
I have compared the output of several implementations of the sample entropy in the Jupyter notebook attached. As you can see, the two methods implemented in entropy gives similar results to the nolds package, but a different output than the example code on Wikipedia as well as a Matlab implementation. Even though the differences between implementations are quite small, it is still troublesome and it would be great to understand what causes them. I'll definitely look at that in the next weeks, but please let me know if you make any progress on your side.
One other minor issue that can lead to very small differences is whether we use the sample or population standard deviation to define the tolerance, i.e. r = 0.2 * np.std(x, ddof=0)
or r = 0.2 * np.std(ddof=1)
, respectively. I have found both implementations, but I'm not sure which one is more valid.
Take care, Raphael
Thanks @raphaelvallat for very detailed answer and the comparison script.
Recently, I have been looking at the paper here: Shi (2017) and the paper seems to suggest that:
for a signal with N number of samples, m dimension and Ļ time delay, the number of vector formed is limited to NāmĻ vectors.
And since the full embedding matrix that we obtain has the shape of N - (m-1)Ļ - by - m, it would make more sense to remove the Ļ number of embedded vector when approximate entropy is calculated š¤
I'm still investigating if my above interpretation is correct. Do let me know if you have a different interpretation of this paper š
By the way, we might look into implementing a function to optimize r for each signal, based on this paper Lu (2008). True to what you say, since r can have such significant effect on the results, I think it should deserve some "complex optimization" on it own š