modeltime
modeltime copied to clipboard
Error with modeltime_fit_workflowsets
I am getting an error of Error: Error in analysis(x): object 'splits' not found
Splits and Features:
splits <- initial_time_split(
data_final_tbl
, prop = 0.8
, cumulative = TRUE
)
# Features ----------------------------------------------------------------
recipe_base <- recipe(value ~ ., data = training(splits))
recipe_date <- recipe_base %>%
step_timeseries_signature(date_col) %>%
step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) %>%
step_normalize(contains("index.num"), contains("date_col_year"))
recipe_fourier <- recipe_date %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE) %>%
step_fourier(date_col, period = 365/12, K = 1) %>%
step_YeoJohnson(value, limits = c(0,1))
recipe_fourier_final <- recipe_fourier %>%
step_nzv(all_predictors())
recipe_pca <- recipe_base %>%
step_timeseries_signature(date_col) %>%
step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE) %>%
step_normalize(value) %>%
step_fourier(date_col, period = 365/52, K = 1) %>%
step_normalize(all_numeric_predictors()) %>%
step_nzv(all_predictors()) %>%
step_pca(all_numeric_predictors(), threshold = .95)
recipe_num_only <- recipe_pca %>%
step_rm(-value, -all_numeric_predictors())
Make the model_spec
# XGBoost -----------------------------------------------------------------
model_spec_boost <- boost_tree(
mode = "regression",
mtry = round(sqrt(ncol(training(splits)) - 1), 0),
trees = round(sqrt(nrow(training(splits)) - 1), 0),
min_n = round(sqrt(ncol(training(splits)) - 1), 0),
tree_depth = round(sqrt(ncol(training(splits)) - 1), 0),
learn_rate = 0.3,
loss_reduction = 0.01
) %>%
set_engine("xgboost")
wfsets <- workflow_set(
preproc = list(
base = recipe_base,
date = recipe_date,
fourier = recipe_fourier,
fourier_final = recipe_fourier_final,
pca = recipe_pca,
num_only_pca = recipe_num_only
),
models = list(
model_spec_boost
),
cross = TRUE
)
parallel_start(n_cores)
wf_fits <- wfsets %>%
modeltime_fit_workflowset(
data = training(splits)
, control = control_fit_workflowset(
allow_par = TRUE
, verbose = TRUE
)
)
parallel_stop()
Gives the error:
Using existing parallel backend with 5 clusters (cores)...
Beginning Parallel Loop | 0.005 seconds
Model 1 Error: Error in analysis(x): object 'splits' not found
Model 2 Error: Error in analysis(x): object 'splits' not found
Model 3 Error: Error in analysis(x): object 'splits' not found
Model 4 Error: Error in analysis(x): object 'splits' not found
Model 5 Error: Error in analysis(x): object 'splits' not found
Model 6 Error: Error in analysis(x): object 'splits' not found
Finishing parallel backend. Clusters are remaining open. | 2.909 seconds
Close clusters by running: `parallel_stop()`.
Total time | 2.909 seconds
-- Model Failure Report ------------------------------------
# A tibble: 6 x 2
.model_id .model
<int> <list>
1 1 <NULL>
2 2 <NULL>
3 3 <NULL>
4 4 <NULL>
5 5 <NULL>
6 6 <NULL>
Some models failed during fitting: modeltime_fit_workflowset():
- Model 1: Is NULL.
- Model 2: Is NULL.
- Model 3: Is NULL.
- Model 4: Is NULL.
- Model 5: Is NULL.
- Model 6: Is NULL.
Action: Review any error messages.
-- End Model Failure Report --------------------------------
Yet when I do the following:
model_spec_boost <- boost_tree(
mode = "regression",
mtry = round(sqrt(ncol(training(splits)) - 1), 0),
trees = round(sqrt(nrow(training(splits)) - 1), 0),
min_n = round(sqrt(ncol(training(splits)) - 1), 0),
tree_depth = round(sqrt(ncol(training(splits)) - 1), 0),
learn_rate = 0.3,
loss_reduction = 0.01
) %>%
set_engine("xgboost")
wflw_fit_xgboost <- workflow() %>%
add_recipe(recipe_num_only) %>%
add_model(model_spec_boost) %>%
fit(training(splits))
mdl_tbl <- modeltime_table(wflw_fit_xgboost)
calibration_tbl <- mdl_tbl %>%
modeltime_calibrate(new_data = testing(splits))
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits)
, actual_data = data_final_tbl
) %>%
plot_modeltime_forecast(
.conf_interval_show = FALSE
)
I get a plot
Hi @spsanderson ,
If you are getting this error it is possibly because some variable is not ok in your base recipe and XGBoost is not accepting it. By not being ok in the base recipe it is being carried over to all the other recipes. For example, if you have a date field or a factor you should remove the date and transform the factor field to dummies for example.
Here is an example reproducing your problem:
splits <- initial_time_split(
m4_monthly
, prop = 0.8
, cumulative = TRUE
)
recipe_base_bad <- recipe(value ~ ., data = training(splits))
recipe_base_ok <- recipe(value ~ ., data = training(splits)) %>%
step_rm(date) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE)
model_spec_boost <- boost_tree(
mode = "regression",
mtry = round(sqrt(ncol(training(splits)) - 1), 0),
trees = round(sqrt(nrow(training(splits)) - 1), 0),
min_n = round(sqrt(ncol(training(splits)) - 1), 0),
tree_depth = round(sqrt(ncol(training(splits)) - 1), 0),
learn_rate = 0.3,
loss_reduction = 0.01
) %>%
set_engine("xgboost")
wfsets <- workflow_set(
preproc = list(
base = recipe_base_ok
),
models = list(
model_spec_boost
),
cross = TRUE
)
wf_fits <- wfsets %>%
modeltime_fit_workflowset(
data = training(splits)
, control = control_fit_workflowset(
allow_par = FALSE
, verbose = TRUE
)
)
Hope it helps
@AlbertoAlmuinha I expect that 4 of the models will fail as they contain a date feature and some have other non-numeric features, but the recipe that I am using in the second example recipe_num_only works on its on the modeltime workflow but not inside of the modeltime_fit_workflowsets which is confusing to me, working on it's own but not in the workflowsets does not make sense, the error itself is also to me confusing, to me it says it literally cannot find my splits object.
@spsanderson It's difficult to say without a reprex to play a bit with the data. Yeah, the error description is not the best one, but that part is difficult to control...
With which recipe did you create the attached excel? Don't fit with any of the recipes in the first message
recipe_num_only
I don't get that result with recipe_num_only. The first message recipe is:
recipe_num_only <- recipe_pca %>%
step_rm(-value, -all_numeric_predictors())
In the last step you are keeping a "value" column...you don't have any value column in the attached information....so I'm missing something here
@spsanderson @mdancho84
Definitely we need to take a look at this because something is going on. Apparently it should work (and in fact, if you launch it in sequential it works correctly) but the parallel functionality for some reason is not working correctly I think. We need to check this.
For the moment, you can use allow_par = FALSE to make it work
@AlbertoAlmuinha thanks I will try it right now and let you know
That did it.
@spsanderson ok, I found the problem...Right now you can't define the model based on other variables (splits in this case):
model_spec_boost <- boost_tree(
mode = "regression",
mtry = round(sqrt(ncol(training(splits)) - 1), 0),
trees = round(sqrt(nrow(training(splits)) - 1), 0),
min_n = round(sqrt(ncol(training(splits)) - 1), 0),
tree_depth = round(sqrt(ncol(training(splits)) - 1), 0),
learn_rate = 0.3,
loss_reduction = 0.01
) %>%
set_engine("xgboost")
What happens is that these variables are not sent to the nodes where the computation is performed and therefore when the calculation is going to be performed it fails because it does not find them. If you change the variables by a number you will see that everything works correctly:
model_spec_boost <- boost_tree(
mode = "regression",
mtry = 1,
trees = 8,
min_n = 1,
tree_depth = 1,
learn_rate = 0.3,
loss_reduction = 0.01
) %>%
set_engine("xgboost")
Maybe we can find a solution to improve this situation,
Regards
Ahhhhhhhh ok, I don't think a solution to this is necessary, I think it would be better if those settings were made outside of the spec process.
What do you think about this @mdancho84 ?? We could include an "export" argument to modeltime_fit_workflowsets() which would be a object (or a named list if multiples objects are required) and export to the nodes this "export" object. The implementation would be quite easy.
Or do you prefer to leave things as they are?
It's a unique case but we can add an exports arg inside of the control objects. That might help.