modeltime icon indicating copy to clipboard operation
modeltime copied to clipboard

Add feature to combine global models with nested models

Open LeoTimmermans opened this issue 3 years ago • 5 comments

When running many time series we want to experiment. Using global models and nested models is a natural thing to do when there are not too many time series. It would be nice to have a simple way to to compare performance and pick the best (top n) models (by id) no matter if it is a nested model or a global model.

It would be good to mix and mash.

LeoTimmermans avatar Nov 03 '21 19:11 LeoTimmermans

I'm attempting to do this same thing. Has it been implemented?

kransom14 avatar Aug 16 '22 22:08 kransom14

Not yet. I know a bunch of students want this. But it's going to be a while before I can tackle. Any help would be appreciated.

CC @AlbertoAlmuinha

mdancho84 avatar Aug 17 '22 12:08 mdancho84

Hey,

Could you give an example of what you want to achieve? This way it will be easier for me to try to implement it.

AlbertoAlmuinha avatar Aug 17 '22 14:08 AlbertoAlmuinha

Sure, here's what I have so far. I make an ARIMA by use of nesting and then I make a deepAR by use of a global model. Then I calculate the error metrics for each method by id and join the two accuracy tables to compare performance. Ideally, we could combine the two approaches so it's all together and only requires one training and testing split. Perhaps if the arima_reg() parsnip function could take an id argument similar to the deepAR?

library(tidymodels)
library(modeltime.gluonts)
library(tidyverse)
library(timetk)

walmart_sales_weekly

# weekly sales data from 7 departments
data <- walmart_sales_weekly %>% 
  select(id, Date, Weekly_Sales) %>%
  set_names(c("ID", "date", "value"))

data

# nested arima, bad model just for use as example
splits_nest <- data %>% 
  extend_timeseries(.id_var = ID, .length_future = 25, .date_var = date) %>% 
  nest_timeseries(.id_var = ID, .length_future = 25) %>% 
  split_nested_timeseries(.length_test = 25)

rec_arima <- recipe(value ~ date, extract_nested_train_split(splits_nest))

wflw_arima <- workflow() %>%
  add_model(
    arima_reg(non_seasonal_ar = 2,
              non_seasonal_differences = 1,
              non_seasonal_ma = 1) %>% 
      set_engine("arima") 
  ) %>% 
  add_recipe(rec_arima)

nested_modeltime_tbl <- modeltime_nested_fit(nested_data = splits_nest, wflw_arima)

# accuracy by id
nested_modeltime_tbl %>% 
  extract_nested_test_accuracy() %>% # nesting doesn't require calibration? It's all included?
  group_by(ID)


# global model with deep ar
# requires different data set up
# create training and testing splits
splits <- time_series_split(
  data = data,
  assess = 25, 
  cumulative = TRUE)

splits %>% 
  tk_time_series_cv_plan() %>% 
  plot_time_series_cv_plan(date, value)

splits

# deep AR model
fit_deepar_gluonts <- deep_ar(
  id = "ID",
  freq = "w", # 1 week frequency
  prediction_length = 25, # 25 weeks
  lookback_length = 50,  # 50 weeks
  epochs = 10) %>% 
  set_engine("gluonts_deepar") %>% 
  fit(value ~ ID + date, data = training(splits))

# calibrate by id
calib_tbl <- modeltime_table( # modeltime table stores list of fitted models
  fit_deepar_gluonts
) %>% 
  modeltime_calibrate(testing(splits), id = "ID") 

calib_tbl

# accuracy by id
calib_tbl %>% 
  modeltime_accuracy(acc_by_id = TRUE)

# combine results by id from both approaches for comparison by ID
all_test_results <- rbind(nested_modeltime_tbl %>% 
                            extract_nested_test_accuracy() %>% 
                            group_by(ID), 
                            calib_tbl %>% 
                            modeltime_accuracy(acc_by_id = TRUE))

kransom14 avatar Aug 17 '22 17:08 kransom14

I agree with @kransom14. An alternative could be being able to combine the modeltime tables and calibrate, calculate accuracy and select the best model (by id) from there.

LeoTimmermans avatar Aug 18 '22 18:08 LeoTimmermans