trendbreaker
trendbreaker copied to clipboard
asmodee() is not (always) deterministic
While looking at a special case, I found outlier detection is not deterministic. In the code below, asmodee
is applied 100 times to the same time series, but only a fraction of the trials finds outliers. (Actually none should, see https://github.com/reconhub/trendbreaker/issues/36.) This fraction itself varied substantially between runs.
This likely due to the way model selection is performed.
Code:
library(trendbreaker)
models <- list(
poisson_constant = glm_model(count ~ 1, family='poisson'),
regression = lm_model(count ~ date),
negbin_time = glm_nb_model(count ~ date)
)
ts <- data.frame(
date=1:42,
count=c(2, 2, 2, 2, 1, 1, 2, 2, 2, 0, 1, 0, 0, 0, 1, 0, 1, 1, 2, 1, 1, 0, 1, 2, 1, 2, 2, 2, 0, 0,
2, 3, 2, 1, 0, 1, 1, 0, 2, 3, 0, 7)
)
i_outlier <- c()
n_trials <- 100
for (j in 1:n_trials) {
asmodee_res <- asmodee(
ts,
models = models,
alpha = 0,
max_k = 12,
method = evaluate_aic
)
if (any(asmodee_res$results$outlier)) {
i_outlier <- c(i_outlier, j)
}
}
print('Proportion of trials with outliers:')
print(paste0(round(100*length(i_outlier)/n_trials),'%'))
Output:
[1] "Proportion of trials with outliers:"
[1] "32%"
(The percentage will vary from run to run)