modeltime
modeltime copied to clipboard
Add feature to combine global models with nested models
When running many time series we want to experiment. Using global models and nested models is a natural thing to do when there are not too many time series. It would be nice to have a simple way to to compare performance and pick the best (top n) models (by id) no matter if it is a nested model or a global model.
It would be good to mix and mash.
I'm attempting to do this same thing. Has it been implemented?
Not yet. I know a bunch of students want this. But it's going to be a while before I can tackle. Any help would be appreciated.
CC @AlbertoAlmuinha
Hey,
Could you give an example of what you want to achieve? This way it will be easier for me to try to implement it.
Sure, here's what I have so far. I make an ARIMA by use of nesting and then I make a deepAR by use of a global model. Then I calculate the error metrics for each method by id and join the two accuracy tables to compare performance. Ideally, we could combine the two approaches so it's all together and only requires one training and testing split. Perhaps if the arima_reg()
parsnip function could take an id argument similar to the deepAR?
library(tidymodels)
library(modeltime.gluonts)
library(tidyverse)
library(timetk)
walmart_sales_weekly
# weekly sales data from 7 departments
data <- walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
set_names(c("ID", "date", "value"))
data
# nested arima, bad model just for use as example
splits_nest <- data %>%
extend_timeseries(.id_var = ID, .length_future = 25, .date_var = date) %>%
nest_timeseries(.id_var = ID, .length_future = 25) %>%
split_nested_timeseries(.length_test = 25)
rec_arima <- recipe(value ~ date, extract_nested_train_split(splits_nest))
wflw_arima <- workflow() %>%
add_model(
arima_reg(non_seasonal_ar = 2,
non_seasonal_differences = 1,
non_seasonal_ma = 1) %>%
set_engine("arima")
) %>%
add_recipe(rec_arima)
nested_modeltime_tbl <- modeltime_nested_fit(nested_data = splits_nest, wflw_arima)
# accuracy by id
nested_modeltime_tbl %>%
extract_nested_test_accuracy() %>% # nesting doesn't require calibration? It's all included?
group_by(ID)
# global model with deep ar
# requires different data set up
# create training and testing splits
splits <- time_series_split(
data = data,
assess = 25,
cumulative = TRUE)
splits %>%
tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(date, value)
splits
# deep AR model
fit_deepar_gluonts <- deep_ar(
id = "ID",
freq = "w", # 1 week frequency
prediction_length = 25, # 25 weeks
lookback_length = 50, # 50 weeks
epochs = 10) %>%
set_engine("gluonts_deepar") %>%
fit(value ~ ID + date, data = training(splits))
# calibrate by id
calib_tbl <- modeltime_table( # modeltime table stores list of fitted models
fit_deepar_gluonts
) %>%
modeltime_calibrate(testing(splits), id = "ID")
calib_tbl
# accuracy by id
calib_tbl %>%
modeltime_accuracy(acc_by_id = TRUE)
# combine results by id from both approaches for comparison by ID
all_test_results <- rbind(nested_modeltime_tbl %>%
extract_nested_test_accuracy() %>%
group_by(ID),
calib_tbl %>%
modeltime_accuracy(acc_by_id = TRUE))
I agree with @kransom14. An alternative could be being able to combine the modeltime tables and calibrate, calculate accuracy and select the best model (by id) from there.