trendbreaker icon indicating copy to clipboard operation
trendbreaker copied to clipboard

asmodee() is not (always) deterministic

Open stephaneghozzi opened this issue 4 years ago • 0 comments

While looking at a special case, I found outlier detection is not deterministic. In the code below, asmodee is applied 100 times to the same time series, but only a fraction of the trials finds outliers. (Actually none should, see https://github.com/reconhub/trendbreaker/issues/36.) This fraction itself varied substantially between runs.

This likely due to the way model selection is performed.

Code:

library(trendbreaker)

models <- list(
  poisson_constant = glm_model(count ~ 1, family='poisson'),
  regression = lm_model(count ~ date),
  negbin_time = glm_nb_model(count ~ date)
)

ts <- data.frame(
  date=1:42,
  count=c(2, 2, 2, 2, 1, 1, 2, 2, 2, 0, 1, 0, 0, 0, 1, 0, 1, 1, 2, 1, 1, 0, 1, 2, 1, 2, 2, 2, 0, 0,
    2, 3, 2, 1, 0, 1, 1, 0, 2, 3, 0, 7)
)

i_outlier <- c()
n_trials <- 100
for (j in 1:n_trials) {
  asmodee_res <- asmodee(
    ts,
    models = models,
    alpha = 0,
    max_k = 12,
    method = evaluate_aic
  )
  if (any(asmodee_res$results$outlier)) {
    i_outlier <- c(i_outlier, j)
  }
}
print('Proportion of trials with outliers:')
print(paste0(round(100*length(i_outlier)/n_trials),'%'))

Output:

[1] "Proportion of trials with outliers:"
[1] "32%"

(The percentage will vary from run to run)

stephaneghozzi avatar Aug 15 '20 15:08 stephaneghozzi