AnomalyDetection
AnomalyDetection copied to clipboard
this algorithm may be faster?
Twitter anomaly detection contain two parts: stl decomposition and generalized esd anomaly detection. first, stl decomposition, if we use Coarse granularity to decompose the series, the time will decrease,this just a trick, but it can help reduce the time. second, generalized esd can easily be implemented in o(nlogn), not o(n^2) in this repository. just like this:
medianIndex <- n/2
left <- 1
right <- n
nowLength <- n
for (i in 1L:max_outliers) {
if(one_tail){
p <- 1 - alpha/(n-i+1)
} else {
p <- 1 - alpha/(2*(n-i+1))
}
t <- qt(p,(n-i-1L))
lambda_critical <- t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))if (left >= right) break
if (nowLength < 1) break
# remove largest
# remove the max diff left or right
if (abs(data[[2L]][aresOrder[left]] - dataMedian) > abs(data[[2L]][aresOrder[right]] - dataMedian)) {
temp_max_idx <- aresOrder[left]
left <- left + 1
medianIndex <- medianIndex + 1
}
else {
temp_max_idx <- aresOrder[right]
right <- right - 1
medianIndex <- medianIndex - 1
}
# get the R
R <- abs((data[[2L]][temp_max_idx] - dataMedian) / dataStd)
# recalculate the dataMean and dataStd
# use math sd
#dataStd <- sqrt((nowLength * (dataStd**2 + dataMean**2) - data[[2L]][temp_max_idx]**2 - (nowLength * dataMean - data[[2L]][temp_max_idx])**2/(nowLength - 1)) / (nowLength - 1))
# use statics sd
dataStd <- sqrt(((nowLength - 1) * (dataStd**2 + dataMean**2) - data[[2L]][temp_max_idx]**2 - ((nowLength - 1) * dataMean - data[[2L]][temp_max_idx])**2/(nowLength - 2)) / (nowLength - 2))
dataMean <- (dataMean * nowLength - data[[2L]][temp_max_idx]) / (nowLength - 1)
dataMedian <- data[[2L]][aresOrder[medianIndex]]
nowLength <- nowLength - 1
#record the inx
R_idx[i] <- data[[1L]][temp_max_idx]
R_score[i] <- R
if (R < lambda_critical || is.nan(dataStd)) {
break
}
#points(temp_max_idx, data[[2L]][temp_max_idx], col = "red")
num_anoms <- i
}
for more detail please read anomalydetection also i have a java implememtion twitter-anomalyDetection-java
i have some experiments, the size of 10000 series data as input, the anomaly detection part raised 60 times compared with older.
This was added?