sl3 copied to clipboard
time series sl3 r and rolling cross validation
I want to apply time series rolling/cross validation. Though the data(washb_data) used below is not the times series. I am just assuming it as time series. so that we can make it reproducible and I shall be able to apply on my time series data. I am error getting same error with my actual time series data as well. I have added one line code from your time series
folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50) Howver, when I reached sl_fit <- sl$train(washb_task). I get the following error. I don't know to fix it.
Error in set(private$.data, j = new_col_names, value = new_data) : Supplied 570 items to be assigned to 1000 items of column 'd47fdc00-01a0-11ea-a044-4560ff6b69d1_Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code
The rest are your codes library(data.table) library(knitr) library(kableExtra) library(tidyverse) library(origami) library(SuperLearner) library(sl3)
load data set and take a peek
washb_data <- fread("", stringsAsFactors = TRUE)
washb_data <- washb_data[1:1000 ,] head(washb_data) %>% kable(digits = 4) %>% kableExtra:::kable_styling(fixed_thead = T) %>% scroll_box(width = "100%", height = "300px")
specify the outcome and covariates
outcome <- "whz" covars <- colnames(washb_data)[-which(names(washb_data) == outcome)] folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)
create the sl3 task
washb_task <- make_sl3_Task( data = washb_data, covariates = covars, outcome = outcome, folds = folds )
choose base learners
lrnr_glm <- make_learner(Lrnr_glm) lrnr_mean <- make_learner(Lrnr_mean) lrnr_glmnet <- make_learner(Lrnr_glmnet)
lrnr_ranger100 <- make_learner(Lrnr_ranger, num.trees = 100) lrnr_hal_simple <- make_learner(Lrnr_hal9001, degrees = 1, n_folds = folds) lrnr_gam <- Lrnr_pkg_SuperLearner$new("SL.gam") lrnr_bayesglm <- Lrnr_pkg_SuperLearner$new("SL.bayesglm")
stack <- make_learner( Stack, lrnr_glm, lrnr_mean, lrnr_ranger100, lrnr_glmnet, lrnr_gam, lrnr_bayesglm ) metalearner <- make_learner(Lrnr_nnls) screen_cor <- Lrnr_pkg_SuperLearner_screener$new("screen.corP")
which covariates are selected on the full data?
screen_cor$train(washb_task) cor_pipeline <- make_learner(Pipeline, screen_cor, stack) fancy_stack <- make_learner(Stack, cor_pipeline, stack)
we can visualize the stack
dt_stack <- delayed_learner_train(fancy_stack, washb_task) plot(dt_stack, color = FALSE, height = "400px", width = "100%") sl <- make_learner(Lrnr_sl, learners = fancy_stack, metalearner = metalearner )
we can visualize the super learner
dt_sl <- delayed_learner_train(sl, washb_task) plot(dt_sl, color = FALSE, height = "400px", width = "100%")
sl_fit <- sl$train(washb_task) sl_preds <- sl_fit$predict() head(sl_preds)
I get the same problem even with this sample codes of
library(data.table) library(origami) library(sl3) library(xts)
load data
head(bsds) #Create a time-series object:
#Visualize the time-series:
PerformanceAnalytics::chart.TimeSeries(tsdata, auto.grid = FALSE, main = "Count of total rental bikes")
#Final setup
folds = origami::make_folds(tsdata, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)
covars <- "cnt"
outcome <- "cnt"
create the sl3 task and take a look at it
ts_uni_task <- sl3_Task$new(data = bsds, covariates = covars,
outcome = outcome, outcome_type = "continuous", folds=folds)
let's take a look at the sl3 task
n_ahead_param <- 2 lrnr_arima <- Lrnr_arima$new(n.ahead = n_ahead_param) fit_arima <- lrnr_arima$train(ts_uni_task)
verify that the learner is fit
fit_arima$is_trained pred_arima <- fit_arima$predict()
head(pred_arima) lrnr_tsdyn_linear <- Lrnr_tsDyn$new(learner = "linear", m = 1,
n.ahead = n_ahead_param)
lrnr_tsdyn_setar <- Lrnr_tsDyn$new(learner = "setar", m = 1, model = "TAR",
n.ahead = n_ahead_param)
lrnr_tsdyn_lstar <- Lrnr_tsDyn$new(learner = "lstar", m = 1,
n.ahead = n_ahead_param)
lrnr_garch <- Lrnr_rugarch$new(n.ahead = n_ahead_param)
lrnr_expsmooth <- Lrnr_expSmooth$new(n.ahead = n_ahead_param)
lrnr_harmonicreg <- Lrnr_HarmonicReg$new(n.ahead = n_ahead_param, K = 7,
freq = 105)
ts_stack <- Stack$new(lrnr_arima, lrnr_tsdyn_linear, lrnr_tsdyn_setar,
ts_stack_fit <- ts_stack$train(ts_uni_task)
ts_stack_preds <- ts_stack_fit$predict() Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. Failed on predict Error in self$compute_step() : Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
There seems to be a recent bug in sl3
that prevents time series super learner from working correctly. Thanks for reporting this. We'll get it fixed ASAP
Thank you so much!! I shall be desperately waiting for the new update on it. The problem seems to be related to data.table.
Do you have any update on the above-mentioned problem?
Hi- sorry for the delay. I was able to fix it, and will be pushing the updated version in the next few days (I need to check other CVs as well).
Hello Ivana Malenica, Thanks alot! This is a great news. I hope we will get updated version soon.
This should now be fixed on devel. You can install the devel version by doing install_github("tlverse/sl3@devel"). It will be merged up to master shortly.
First of all, I removed old version of sl3 and reinstall it using the link you provided. I checked again using the my own data/codes and this example When I reached to this line of codes ts_stack_preds <- ts_stack_fit$predict().
I still get the same problem. Am I making any mistake.?
Thanks in Advance.
Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. Failed on predict Error in self$compute_step() : Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.