fable icon indicating copy to clipboard operation
fable copied to clipboard

Re-consider IC output for refitted models

Open TimothyHyndman opened this issue 5 years ago • 6 comments

Using the latest github version of fable here (but issue is also present on CRAN version).

library(tsibble)
library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

lung_deaths <- as_tsibble(mdeaths)

# Fit ARIMA on first part of timeseries.
ts1 <- lung_deaths %>% filter(index <= as.Date("1977-01-01"))
#> Warning in mask$eval_all_filter(dots, env_filter): Incompatible methods
#> ("<=.vctrs_vctr", "<=.Date") for "<="
fit <- ts1 %>%
  model(ARIMA(value ~ 1 + pdq(1,0,0) + PDQ(0,0,0)))

# Refit ARIMA with one more observation
ts2 <- lung_deaths %>% filter(index > as.Date("1977-01-01"))
#> Warning in mask$eval_all_filter(dots, env_filter): Incompatible methods
#> (">.vctrs_vctr", ">.Date") for ">"
fit %>%
  refit(ts2 %>% head(1)) %>%
  report()
#> Series: value 
#> Model: ARIMA(1,0,0) w/ mean 
#> 
#> Coefficients:
#>          ar1  constant
#>       0.7851  364.8801
#> s.e.  0.1030   43.9553
#> 
#> sigma^2 estimated as 1579:  log likelihood=-5.58
#> AIC=13.16   AICc=9.16   BIC=11.16

# Refit ARIMA with two more observations
fit %>%
  refit(ts2 %>% head(2)) %>%
  report()
#> Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
#> You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
#> Error: Problem with `mutate()` input `ARIMA(value ~ 1 + pdq(1, 0, 0) + PDQ(0, 0, 0))`.
#> x Could not find an appropriate ARIMA model.
#> This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
#> For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
#> ℹ Input `ARIMA(value ~ 1 + pdq(1, 0, 0) + PDQ(0, 0, 0))` is `(function (object, ...) ...`.

Created on 2020-05-18 by the reprex package (v0.3.0)

TimothyHyndman avatar May 18 '20 01:05 TimothyHyndman

@robjhyndman This is coming from here (which comes from forecast::auto.arima()) https://github.com/tidyverts/fable/blob/64b36c6df3285e33c912991a1cc885604bbb3158/R/arima.R#L153

Any reason why npar is the number of estimated coefficients +1?

mitchelloharawild avatar May 18 '20 02:05 mitchelloharawild

Also @TimothyHyndman, when you say "Refit ARIMA with two more observations", you're refitting the model with only two observations (not two more observations). You'll need to refit the model with the historical data and the next two observations (or use stream.ARIMA once #251 is done).

mitchelloharawild avatar May 18 '20 02:05 mitchelloharawild

npar = # coefficients + 1 to account for the residual variance. This is how the AIC/AICc/BIC defines # parameters in a model.

robjhyndman avatar May 18 '20 02:05 robjhyndman

Got it, thanks. https://github.com/tidyverts/fable/blob/64b36c6df3285e33c912991a1cc885604bbb3158/R/arima.R#L159

Then in this case (n=2, d=0, D=0), nstar = 2 giving division by (nstar - npar - 1) = 0. Is this correct?

mitchelloharawild avatar May 18 '20 02:05 mitchelloharawild

It is problematic computing AICc on a refit because the parameters were not estimated on that data set. If the data was used for estimation, then the equation is correct. But you would expect nstar to always be bigger than npar+1 or you would be over-fitting.

I'm not sure what we should return as AICc value on a refit -- possibly the original AICc on the original data, or perhaps NA or NULL.

robjhyndman avatar May 18 '20 02:05 robjhyndman

Also @TimothyHyndman, when you say "Refit ARIMA with two more observations", you're refitting the model with only two observations (not two more observations). You'll need to refit the model with the historical data and the next two observations (or use stream.ARIMA once #251 is done).

Ah, I was thinking that refit used the data contained in fit in addition to the data passed in with new_data = . Thanks for the correction.

TimothyHyndman avatar May 18 '20 03:05 TimothyHyndman