tsfeatures
tsfeatures copied to clipboard
`sampenc` implementation
Hi,
I am trying to understand the logic behind sampenc
.
I figured it is a modification of douglas lake's code?
But there seems to be 2 issues:
- the line
p <- A[2]/B[1]
I think should bep<-A[M]/B[M-1]
issue 2 is, I don't quite get lake's code (is it correct?). I tried to follow his argument on https://www.physionet.org/content/sampen/1.0.0/ especially
If a particular run ends up being of length 4, for example, then that means that 1 is added to the count for template matches of length 4. In addition, there are 2 template matches of length 3, 3 of length 2, and 4 of length 1 that need to be added to the corresponding counts.
But I couldn't verify this behaviour using some simple simulated series.
So I
- re-implement Richman and Moorman original algorithm literally
- use
pracma::sample_entropy
- re-implement Lake's matlab literally
- modify 'tsfeatures::sampenc` according to point 1 above
and did a comparison
my_sam_en=function(ts,M,r){ ## richmand and moormand literally
N=length(ts)
subts=function(.ts,l,exclude_last=T){
N=length(.ts)
if(exclude_last)return(lapply(1:(N-l),function(i).ts[i:(i+l-1)]))
lapply(1:(N-l+1),function(i).ts[i:(i+l-1)])
}
## all length M sequences, up to begining from N-M
x_M=subts(ts,M,exclude_last = T)
x_Mp1=subts(ts,M+1,exclude_last=F)
count_r=function(xx){
# browser()
d=Vectorize(function(a,b)max(abs(xx[[a]]-xx[[b]])),c('a','b'))
N=length(xx)
O=outer(1:N,1:N,d)
sum(O<(r-1e-10))-nrow(O)
}
C_M_r=count_r(x_M)
C_Mp1_r=count_r(x_Mp1)
-log(C_Mp1_r/C_M_r)
}
modified_sampenc=function (y, M = 6, r = 0.3) ## tsfeatures::sampenc
{
N <- length(y)
lastrun <- numeric(N)
run <- numeric(N)
A <- numeric(M)
B <- numeric(M)
for (i in 1:(N - 1)) {
y1 <- y[i]
for (jj in 1:(N - i)) {
j <- i + jj
if (isTRUE(abs(y[j] - y1) < r)) {
run[jj] <- lastrun[jj] + 1
M1 <- min(M, run[jj])
A[1:M1] <- A[1:M1] + 1
if (j < N)
B[1:M1] <- B[1:M1] + 1
}
else {
run[jj] <- 0
}
}
for (j in 1:(N - i)) {
lastrun[j] <- run[j]
}
}
p <- A[M]/B[M-1]
e <- -log(p)
return(e)
}
lake=function(y,M,r){
n=length(y);
lastrun=rep(0,n);
run=rep(0,n);
A=rep(0,M);
B=rep(0,M);
p=rep(0,M);
e=rep(0,M);
for(i in 1:(n-1)){
nj=n-i;
y1=y[i];
for(jj in 1:nj){
j=jj+i;
if(abs(y[j]-y1)<r){
run[jj]=lastrun[jj]+1;
M1=min(M,run[jj]);
for(m in 1:M1){
A[m]=A[m]+1;
if(j<n)B[m]=B[m]+1;
}
}else{
run[jj]=0;
}
}
for(j in 1:nj){
lastrun[j]=run[j];
}
}
N=n*(n-1)/2;
B=c(N,B[1:(M-1)]);
p=A/B;
e=-log(p);
e[length(e)] ## only return the last value
}
y=sin(1:100)
M=2;r=0.3
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
my_sam_en(y,M,r),
modified_sampenc(y,M,r),
lake(y,M,r),
tsfeatures::sampenc(y,M,r))
# [1] 0.1006652 0.1006652 0.8065081 0.8065081 0.8065081
M=3;r=0.2
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
my_sam_en(y,M,r),
modified_sampenc(y,M,r),
lake(y,M,r),
tsfeatures::sampenc(y,M,r))
# [1] 0.0000000 0.0000000 0.1459539 0.1459539 0.9808293
So pracma:;sample_entropy
(and my translation of the original paper) more or less agree while strongly disagreeing with lake's code (and thus tsfeatures::sampenc
).
But again, I couldn't follow lake's algorithm at all.
@robjhyndman - is this repo/package still being maintained?
Are the features computed here equivalent to those in FEASTS - or are the two slowly diverging to make way for multi-series computations?
tsfeatures
is still being maintained, at least fixing bugs, etc., and we will continue to ensure it stays on CRAN. The features in feasts
may diverge as that is under more active development.