fable icon indicating copy to clipboard operation
fable copied to clipboard

Feature Request: Automatic K optimization for Fourier Terms

Open AshwinPuri13 opened this issue 5 years ago • 4 comments

If I wish to fit a regression with Fourier terms then to find the optimal K I need to do something like this:

library(fable)
library(dplyr)
library(tidyr)

mbl = tsibbledata::ansett %>%
  tsibble::fill_gaps() %>%
  model(arima1 = ARIMA(Passengers ~ fourier(K = 1) + PDQ(0,0,0)),
        arima2 = ARIMA(Passengers ~ fourier(K = 2) + PDQ(0,0,0)),
        arima3 = ARIMA(Passengers ~ fourier(K = 3) + PDQ(0,0,0)))

metrics = mbl %>%
  glance()

mbl_best = metrics %>%
  select(Airports, Class, .model, AICc) %>%
  group_by(Airports, Class) %>%
  slice(which.min(AICc)) %>%
  left_join(mbl %>%
              gather('.model', 'model', -Airports, -Class),
            by = c('.model', 'Airports', 'Class')) %>%
  as_mable(key = c('Airports', 'Class'), models = 'model')

It would be more convenient for K to be automatically determined through something like this:

model(arima = ARIMA(Passengers ~ Fourier(K = 1:3) + PDQ(0,0,0)

On that note, when I look at the source code for ARIMA it appears that when fitting a regression + ARIMA the number of differences is determined after the regression. Because of this, it seems entirely possible that the arima1, arima2 and arima3 models I fit could potentially have a different number of differencing. If this is indeed the case perhaps determining K through cross validation is better?

Thanks!

AshwinPuri13 avatar Oct 18 '19 16:10 AshwinPuri13

Automating the choice of K could be a feature we look at in a future release. It is very unlikely to affect the order of differencing, so I think using AICc for selection is safe enough.

robjhyndman avatar Oct 19 '19 01:10 robjhyndman

This is something which will need to be added on a model by model basis, as each model will have different methods of model selection.

mitchelloharawild avatar Oct 19 '19 01:10 mitchelloharawild

Could we iteratively select the best K based on the whatever criteria is used in the base model? My idea is to fit fourier series of different K linearly to the response and select the one with the best criteria measure as passed by the base model. Is there a case where we wouldn't want to fit it linearly? I'll admit that re-estimation after fitting the rest of the model would be good, but that this might provided directionality for the user that doesn't know which K to select.

JaySumners avatar Dec 08 '20 20:12 JaySumners

As in interim solution, I am trying to fit multiple moders in a loop manner so that I do not have to repeat the formula so many times.

Yet I am struggling (I do not have that much of a background in tidy R).

Could you help?

juan-g-p avatar Oct 21 '23 19:10 juan-g-p