modeltime icon indicating copy to clipboard operation
modeltime copied to clipboard

extract_nested_future_forecast() fails when adding multiple predictors in recipe() with +

Open rlohne opened this issue 2 years ago • 3 comments

See reprex below. When adding a single predictor via recipe(y ~ x) or using all predictors via recipe(y ~ .) I am able to get future predictions from nested models using the extract_nested_future_forecast() function. When adding multiple via recipe(y ~ x1 + x2), it does not provide an error, but the extract function returns a 0x0 tibble.

This works

library(gapminder)
library(tidyverse)
library(modeltime)
library(tidymodels)
library(lubridate)

# Add a date component to the data frame
gapminder_tbl <- gapminder %>% 
  group_by(country) %>% 
  mutate(Date = ymd(paste(year,01,01,sep = "-"))) %>% 
  filter(continent %in% c("Oceania")) # avoid having too much data for the example 

# Nest each country and extend the time series
nested_data_gapminder_tbl <- gapminder_tbl %>% 
  # Step 1: Extend the time series by country
  extend_timeseries(
    .id_var     = country,
    .date_var   = Date,
    .length_future = 5 # Extend by five years
  ) %>%
  
  # Step 2: Nests the time series into .actual_data and .future_data
  nest_timeseries(
    .id_var     = country,
    .length_future = 5
  ) %>%
  
  # Step 3: Adds a column .splits that contains training/testing indicies
  split_nested_timeseries(
    .length_test = 4
  )

# Create the recipe for prediction
rec_prohpet <- recipe(lifeExp ~ Date, 
                      data = extract_nested_train_split(nested_data_gapminder_tbl))  


# Create workflow object

wflw_prophet <- workflow() %>% 
  add_model(
    prophet_reg("regression", seasonality_yearly = TRUE) %>% 
      set_engine("prophet" )
  ) %>% 
  add_recipe(rec_prohpet)

# Fit the model to the nested data

nested_modeltime_tbl <- modeltime_nested_fit(
  nested_data = nested_data_gapminder_tbl,
  wflw_prophet
) 

# Model performance

nested_modeltime_tbl %>% 
  extract_nested_test_accuracy() %>% 
  table_modeltime_accuracy(.interactive = FALSE)

# Refit to entire dataset and predict again

preds<- nested_modeltime_refit_tbl <- nested_modeltime_tbl %>% 
  modeltime_nested_refit(
    control = control_nested_refit(verbose = TRUE)
  )

# Extract predictions

preds<- nested_modeltime_refit_tbl %>% 
  extract_nested_future_forecast()

This does not work as expected:

library(gapminder)
library(tidyverse)
library(modeltime)
library(tidymodels)
library(lubridate)

# Add a date component to the data frame
gapminder_tbl <- gapminder %>% 
  group_by(country) %>% 
  mutate(Date = ymd(paste(year,01,01,sep = "-"))) %>% 
  filter(continent %in% c("Oceania")) # avoid having too much data for the example 

# Nest each country and extend the time series
nested_data_gapminder_tbl <- gapminder_tbl %>% 
  # Step 1: Extend the time series by country
  extend_timeseries(
    .id_var     = country,
    .date_var   = Date,
    .length_future = 5 # Extend by five years
  ) %>%
  
  # Step 2: Nests the time series into .actual_data and .future_data
  nest_timeseries(
    .id_var     = country,
    .length_future = 5
  ) %>%
  
  # Step 3: Adds a column .splits that contains training/testing indicies
  split_nested_timeseries(
    .length_test = 4
  )

# Create the recipe for prediction
rec_prohpet <- recipe(lifeExp ~ Date + pop, 
                      data = extract_nested_train_split(nested_data_gapminder_tbl))  


# Create workflow object

wflw_prophet <- workflow() %>% 
  add_model(
    prophet_reg("regression", seasonality_yearly = TRUE) %>% 
      set_engine("prophet" )
  ) %>% 
  add_recipe(rec_prohpet)

# Fit the model to the nested data

nested_modeltime_tbl <- modeltime_nested_fit(
  nested_data = nested_data_gapminder_tbl,
  wflw_prophet
) 

# Model performance

nested_modeltime_tbl %>% 
  extract_nested_test_accuracy() %>% 
  table_modeltime_accuracy(.interactive = FALSE)

# Refit to entire dataset and predict again

preds<- nested_modeltime_refit_tbl <- nested_modeltime_tbl %>% 
  modeltime_nested_refit(
    control = control_nested_refit(verbose = TRUE)
  )

# Extract predictions

preds<- nested_modeltime_refit_tbl %>% 
  extract_nested_future_forecast()

rlohne avatar May 23 '22 15:05 rlohne

The reason why I think the issue is with extract_nested_future_forecast() is that

nested_modeltime_tbl_imp %>% 
  extract_nested_test_accuracy() %>%
  table_modeltime_accuracy(.interactive = T)

Provides a table as expected. So the models get trained, it's just not possible to extract the future forecasts.

rlohne avatar May 23 '22 15:05 rlohne

I too have this issue

spsanderson avatar Jul 20 '22 02:07 spsanderson

Hi,

I can reproduce your results, but I think there is not any problem.

Both recipe(lifeExp ~ . ) and recipe(lifeExp ~ Date + pop) returns a 0x0 tibble.

The problem is that you extending your time series with the extend_timeseries() function and the xregs get NA values in the future table. So, when you are going to predict using this table and using xregs, you don't get any prediction because you don't have any xreg.

You need to use a left_join() to give values to your xregs in your future table to be able to get the predictions.

Regards,

AlbertoAlmuinha avatar Aug 10 '22 14:08 AlbertoAlmuinha

Ahh,. of course, that makes sense @AlbertoAlmuinha - thanks! I'll close the issue

rlohne avatar Aug 22 '22 11:08 rlohne

Hi everyone, I'm having the same issue. I know this topic is closed, but perhaps you could show where you left_join the xregs?

I love modeltime and use it for all of my time-series forecasts. I used nested and global forecasts all the time, and now I'd like to add xregs to my nested forecasts. If anyone could help on this exact topic, I could figure it out from there.

dstephens179 avatar Feb 10 '23 16:02 dstephens179

Could you explain this solution in a bit more detail? Im running into similar issue and cant really come to a resolution from these responses to date. thanks!

mgree013 avatar May 23 '23 03:05 mgree013

@AlbertoAlmuinha @rlohne I would also appreciate more information on what objects need to be joined.

nrjenkins avatar Jul 19 '23 16:07 nrjenkins

@mgree013 and @nrjenkins:

if you use xregs in the model, they need to be present in the future data as well. In the example, pop (population) is used as an external regressor:

# Create the recipe for prediction
rec_prohpet <- recipe(lifeExp ~ Date + pop, 
                      data = extract_nested_train_split(nested_data_gapminder_tbl))

However, in the future forecasting dataset, values for the population of the oceanic countries are not present, thus the model fails. CleanShot 2023-07-19 at 18 54 30 The screenshot above illustrates this, on the left is the data.frame, where Date + pop is the data the model is trained on, and on the right is the future data.frame, where you can see that there are no values for pop.

So the solution would be to predict those values, and then left_join those predictions to the future data.frame. This is however, where things get messy because you are creating forecasts for forecasting xregs to be used in another forecast..

rlohne avatar Jul 19 '23 16:07 rlohne