fabletools
fabletools copied to clipboard
Parallel forecasting
I had to do a bit more work to get new_data
to be included in the future_mapply
and apply
calls properly. I've tested on a couple of models with multiples time series and external regressors, and it is working for me both with and without future
attached, as well as both with and without new_data
.
@davidtedfordholt @mitchelloharawild has the parallelization across series yet been implemented, per the note here? I've been digging through @davidtedfordholt's fork as well as the main fabletools
repo and haven't seen anything parallel for forecasting, after those commits linked were reverted. Not sure whether this has been resolved somewhere I just haven't seen yet.
Hello!
I was able to solve the parallelizing issue for the fable modeling with multidplyr party_df
and purrr::map()
. But I am having issues in building the forecast because tsibble does not recognize the multidplyr class or forecast does not recognize the mutlidplyr_party_df. (See errors below).
I am also interested in parallelizing the forecast build. Currently, my forecast take 20 minutes to build for just over 1800 time series and 7 models. You might say that is not a long time, but the model fit takes 12 hours on my laptop, 4 hours on my laptop using multidplyr, and between 1 hr to 2 hrs using docker instance with 90 GB of RAM and 32 cores. That is just for 1 category; I have 17 categories that I will be building forecasts in the future.
So I would be reducing my forecast build run time by 15 minutes times 17 metrics...that would be worth the squeeze. Unfortunely, I can't share an actual reprex. I will have to create a pseudo-example to recreate the issue when I have some time. My point is that I am very interested in seeing forecast parallelized; any word on when this might happen?
Thank you, I love these packages!
Errors:
cluster_mature_fx_future <- function(model_fit, future_data) {
#joined_future_fit <- model_fit %>% dplyr::left_join(future_data)
# fit_str_fx <- model_fit %>%
# dplyr::arrange(.data$group, .data$vd, .data$model_maturity) %>%
# fabletools::mable(key = c(.data$group, .data$vd, .data$model_maturity),
# model = c(.data$arima_units_ly, .data$arima_sls_q_ly_units_ly, .data$arima_sls_aq_ly_units_ly, .data$tslm_sls_aq_ly_units_ly_log, .data$arima_sls_a_ly_units_ly))
#
future_data <- future_data %>%
tsibble::as_tsibble(key = c(.data$group,.data$vd,.data$model_maturity), index = .data$year_wk)
future_str_fx <-
model_fit %>% fabletools::forecast(new_data = future_data)
future_str_fx
}
> fit_str_fx <- clustered_fit_str_fx_nested %>%
+ dplyr::mutate(fx_future = purrr::map2(.x = .data$arima_units_ly,
+ .y = clustered_future_str_drvr_wkly_nested,
+ .f = cluster_mature_fx_future
+ )
+ ))
Error: Remote computation failed:
Problem with `mutate()` input `fx_future`.
x `as_tsibble()` doesn't know how to handle the multidplyr_cluster class yet.
i Input `fx_future` is `purrr::map2(...)`.
Run `rlang::last_error()` to see where the error occurred.
> ts_str_fx_future_base <- clustered_fit_str_fx_nested %>% fabletools::forecast(new_data = clustered_future_str_drvr_wkly_nested)
Error in UseMethod("forecast") :
no applicable method for 'forecast' applied to an object of class "multidplyr_party_df"
Backtrace:
> rlang::last_error()
<error/rlang_error>
Remote computation failed:
Problem with `mutate()` input `fx_future`.
x `as_tsibble()` doesn't know how to handle the multidplyr_cluster class yet.
i Input `fx_future` is `purrr::map2(...)`.
Backtrace:
9. dplyr::mutate(...)
11. multidplyr:::shard_call(.data, "mutate", enquos(...))
12. multidplyr::cluster_send(...)
13. multidplyr::cluster_call(cluster, !!code)
Run `rlang::last_trace()` to see the full context.
@Fredo-XVII , (and @mitchelloharawild ) first and foremost I would just like to apologize for not having touched it in so long! For reason's of my family's mental health, I've been pretty absent for essentially the entire pandemic (and a bit before). I'm definitely all for parallelizing the forecasting. I had mostly focused on the fitting because it took longer in most cases, and I know that @mitchelloharawild was looking at a method that would essentially have been an extensible method, applicable in both of those use cases and others. I have a little clean up to do on the version I'm running at work (which is parallelized for both fitting and forecasting, but NOT at all well-written), but am wanting to look at the whole thing with fresh eyes. I guess that clean up is now my tomorrow.
Hello @davidtedfordholt @jake-mason ,
I was able to get both the model fit and the forecast parallelized using the furrr::future_map_dfr()
and furrr:future_map2_dfr()
. Let me know if you are interested in an example and I will drum one up. One thing to note is that you cannot use dplyr::mutate()
with a furrr function because it will not parallelize; everything is sent to one core. The key is to split()
the dataset first, then pass it as an argument to furrr function.
Thank you for your continued work on this. This year has been tough for many. I haven't touched my blog since before Covid because I can't seem to find the time.
Interesting to know about the mutate()
limitation with {furrr}
. I didn't know about this before.
Hello!
I was able to adapt my function to use furrr
and future
on a docker container. When I run it over a handful of groups, the function builds the forecast pretty quickly, but when I run it over 2000 groups the future runs sequentially or it only runs on 2 cores. If I open up the furrr options to Inf
, then all 28 cores start firing off but the container eventually crashes before completing all the forecasts.
furrr_opts <- furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = Inf)
Currently the fail happens at fabletools::model()
when running on all groups, I don't even get to the forecasting code.
So my question is, has anyone successfully run a fable with future and furrr on a large number of groups? If so, what does the docker look like? R code?
@mitchelloharawild There might be a good chance that I did not have the furrr options correct when I tested dplyr. The future default is to run sequentially, but it can also run sequentially even if you run multi-session because the furrr option settings are not set up for parallelizing.
I have to admit that multidplyr is much easier than future. The only issue is that multidplyr converts the mable into a tibble, and so I can't run the fabletools::forecast()
function on it. I tried converting it to a mable on the fly in the grouping, but somehow that failed.
Just an FYI.
@mitchelloharawild Hi! I have concluded all different types of furrr::options() on our kubernetes cluster and furrr fails when I try to build the forecast for all 2000 groups. The code scales for a hand full of stores however, so it does work. There is no combination of options that I found that resolves this. I also tested the using cluster option with parallel package instead of multisession and that also failed. If you guys figure it out, let me know.
Thanks!
@Fredo-XVII , do you happen to have information from when you were running that about resource usage? I'm just wondering if it's a matter of it being RAM- or core-bound, or something a bit more pernicious. I've often been RAM-bound when using future
and that is a tricky one to fix, but a relatively easy one to diagnose.
@davidtedfordholt Yes, I was able to see the resources. The container has access to 32 cores and 93GB's of RAM. Let me tell you that it never came close to using all that memory, even when all cores where firing.
The container system is running on Linx Ubuntu 10 I believe, R 3.6.3, with a rocker/tidyverse:3.6.3
. I do not use base R docker because the tidyverse version ensures that Java works inside the container.
For sequential
: 1 core used.
For mutlisession(workers = 28, with furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = 1)
: Only 2 cores were firing. Code finished in 18hrs.
For mutlisession(workers = 28, with furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = Inf)
: All cores fire but it crashes the container within minutes.
For mutlisession(workers = cluster, with furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = Inf)
. cluster object from parallel package. All cores fire but it crashes the container within minutes.
The main difference that I saw with multidplyr
is that the setting for multidplyr
the cores where set to no-restore
for all cores, but for furrr and future
the cores were set to --no-restore
but also as --slave
.
No iteration of the furrr options made the code run for all the 2000 groups. Having the scheduling = Inf
options was the only way to get mutlisession to fire on all cores.
Below is an example of my last run but I am not sure exactly what version of the code above I was using. You can see there is a PID 1 issue I can't resolve because I am not a computer scientist, and all my search has resulted in no solution.
Also, you can see that the multidplyr
code for the model fit finished in 1.2 hours for 2000 groups, and is associated with the cores with only --no-restore
. The future + furrr
forecast build below, or --slave
cores, kicks off 3 cores and runs for only 4 min before crashing.
Running the model fit with future + furrr
on all stores also crashes the container, but because I was interested in parallelizing the forecast build after the model fit, I kept multidplyr
in my test below for the model fit.
Let me know if you have other questions.
Thanks for the help!!
Message + ERROR:
Step #4: Clustering Parellel Code Begins
Number of cores used: 28
[1] "Models Dataset built: mature_fx_cluster"
Total Time for Forecast Model Build: 1.2633 hours
[1] "Forecasts Models Dataset (Mable) Built, includes ensemble: fit_str_fx"
[1] "End of Cluster Code"
Step #5: Build the future 52 week store forecasts
Number of cores used: 28
FutureStrategytweakedmultisessionclustermultiprocessfuturefunction
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
<simpleError in serialize(data, node$con, xdr = FALSE): error writing to connection>
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
PID USER TIME COMMAND
1 root 0:06 /runtime-connector Rscript main.R
20 root 1h09 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
3231 root 59:16 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3270 root 1h21 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3309 root 1h06 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3348 root 1h09 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3387 root 1h06 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3465 root 1h08 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3504 root 57:57 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3543 root 1h10 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3582 root 1h04 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3621 root 58:24 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3660 root 1h27 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3699 root 1h22 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3738 root 54:26 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3777 root 1h08 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3816 root 57:41 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3855 root 1h09 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3894 root 1h12 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3933 root 1h00 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
3972 root 1h05 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
4050 root 1h05 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
4089 root 59:38 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
4128 root 55:39 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
4245 root 1h15 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
4284 root 1h13 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
5455 root 4:04 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5516 root 4:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5577 root 4:04 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5638 root 0:06 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5699 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5760 root 0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5821 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5882 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
5943 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6004 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6065 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6126 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6187 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6248 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6309 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6370 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6431 root 0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6492 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6553 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6614 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6675 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6736 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6797 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6858 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6919 root 0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
6980 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
7041 root 0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
7102 root 0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
7209 root 0:00 /tmp/bins/busybox ps
It looks like you're on a system that doesn't use systemd
as the init system. May I ask what operating system are you running this on, @Fredo-XVII ?