fabletools icon indicating copy to clipboard operation
fabletools copied to clipboard

Parallel forecasting

Open davidtedfordholt opened this issue 4 years ago • 11 comments

I had to do a bit more work to get new_data to be included in the future_mapply and apply calls properly. I've tested on a couple of models with multiples time series and external regressors, and it is working for me both with and without future attached, as well as both with and without new_data.

davidtedfordholt avatar Sep 23 '20 21:09 davidtedfordholt

@davidtedfordholt @mitchelloharawild has the parallelization across series yet been implemented, per the note here? I've been digging through @davidtedfordholt's fork as well as the main fabletools repo and haven't seen anything parallel for forecasting, after those commits linked were reverted. Not sure whether this has been resolved somewhere I just haven't seen yet.

jake-mason avatar Apr 01 '21 15:04 jake-mason

Hello!

I was able to solve the parallelizing issue for the fable modeling with multidplyr party_df and purrr::map(). But I am having issues in building the forecast because tsibble does not recognize the multidplyr class or forecast does not recognize the mutlidplyr_party_df. (See errors below).

I am also interested in parallelizing the forecast build. Currently, my forecast take 20 minutes to build for just over 1800 time series and 7 models. You might say that is not a long time, but the model fit takes 12 hours on my laptop, 4 hours on my laptop using multidplyr, and between 1 hr to 2 hrs using docker instance with 90 GB of RAM and 32 cores. That is just for 1 category; I have 17 categories that I will be building forecasts in the future.

So I would be reducing my forecast build run time by 15 minutes times 17 metrics...that would be worth the squeeze. Unfortunely, I can't share an actual reprex. I will have to create a pseudo-example to recreate the issue when I have some time. My point is that I am very interested in seeing forecast parallelized; any word on when this might happen?

Thank you, I love these packages!

Errors:

  cluster_mature_fx_future <- function(model_fit, future_data) {
    
    #joined_future_fit <- model_fit %>% dplyr::left_join(future_data)
    
    # fit_str_fx <- model_fit %>% 
    #   dplyr::arrange(.data$group, .data$vd, .data$model_maturity) %>%
    #   fabletools::mable(key = c(.data$group, .data$vd, .data$model_maturity),
    #                     model = c(.data$arima_units_ly, .data$arima_sls_q_ly_units_ly, .data$arima_sls_aq_ly_units_ly, .data$tslm_sls_aq_ly_units_ly_log, .data$arima_sls_a_ly_units_ly))
    # 
    future_data <- future_data %>% 
      tsibble::as_tsibble(key = c(.data$group,.data$vd,.data$model_maturity), index = .data$year_wk)
    
    future_str_fx <-
      model_fit %>% fabletools::forecast(new_data = future_data)
    
    future_str_fx
  }

>   fit_str_fx <- clustered_fit_str_fx_nested %>%
+     dplyr::mutate(fx_future = purrr::map2(.x = .data$arima_units_ly, 
+                                           .y = clustered_future_str_drvr_wkly_nested,
+                                           .f = cluster_mature_fx_future
+                                          )
+                   ))
Error: Remote computation failed:
Problem with `mutate()` input `fx_future`.
x `as_tsibble()` doesn't know how to handle the multidplyr_cluster class yet.
i Input `fx_future` is `purrr::map2(...)`.
Run `rlang::last_error()` to see where the error occurred.
>   ts_str_fx_future_base <- clustered_fit_str_fx_nested %>% fabletools::forecast(new_data = clustered_future_str_drvr_wkly_nested)
Error in UseMethod("forecast") : 
  no applicable method for 'forecast' applied to an object of class "multidplyr_party_df"

Backtrace:

> rlang::last_error()
<error/rlang_error>
Remote computation failed:
Problem with `mutate()` input `fx_future`.
x `as_tsibble()` doesn't know how to handle the multidplyr_cluster class yet.
i Input `fx_future` is `purrr::map2(...)`.
Backtrace:
  9. dplyr::mutate(...)
 11. multidplyr:::shard_call(.data, "mutate", enquos(...))
 12. multidplyr::cluster_send(...)
 13. multidplyr::cluster_call(cluster, !!code)
Run `rlang::last_trace()` to see the full context.

Fredo-XVII avatar May 21 '21 05:05 Fredo-XVII

@Fredo-XVII , (and @mitchelloharawild ) first and foremost I would just like to apologize for not having touched it in so long! For reason's of my family's mental health, I've been pretty absent for essentially the entire pandemic (and a bit before). I'm definitely all for parallelizing the forecasting. I had mostly focused on the fitting because it took longer in most cases, and I know that @mitchelloharawild was looking at a method that would essentially have been an extensible method, applicable in both of those use cases and others. I have a little clean up to do on the version I'm running at work (which is parallelized for both fitting and forecasting, but NOT at all well-written), but am wanting to look at the whole thing with fresh eyes. I guess that clean up is now my tomorrow.

davidtedfordholt avatar Jun 08 '21 01:06 davidtedfordholt

Hello @davidtedfordholt @jake-mason ,

I was able to get both the model fit and the forecast parallelized using the furrr::future_map_dfr() and furrr:future_map2_dfr(). Let me know if you are interested in an example and I will drum one up. One thing to note is that you cannot use dplyr::mutate() with a furrr function because it will not parallelize; everything is sent to one core. The key is to split() the dataset first, then pass it as an argument to furrr function.

Thank you for your continued work on this. This year has been tough for many. I haven't touched my blog since before Covid because I can't seem to find the time.

Fredo-XVII avatar Jun 08 '21 19:06 Fredo-XVII

Interesting to know about the mutate() limitation with {furrr}. I didn't know about this before.

mitchelloharawild avatar Jun 08 '21 23:06 mitchelloharawild

Hello!

I was able to adapt my function to use furrr and future on a docker container. When I run it over a handful of groups, the function builds the forecast pretty quickly, but when I run it over 2000 groups the future runs sequentially or it only runs on 2 cores. If I open up the furrr options to Inf, then all 28 cores start firing off but the container eventually crashes before completing all the forecasts.

furrr_opts <- furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = Inf)

Currently the fail happens at fabletools::model() when running on all groups, I don't even get to the forecasting code.

So my question is, has anyone successfully run a fable with future and furrr on a large number of groups? If so, what does the docker look like? R code?

Fredo-XVII avatar Jul 01 '21 17:07 Fredo-XVII

@mitchelloharawild There might be a good chance that I did not have the furrr options correct when I tested dplyr. The future default is to run sequentially, but it can also run sequentially even if you run multi-session because the furrr option settings are not set up for parallelizing.

I have to admit that multidplyr is much easier than future. The only issue is that multidplyr converts the mable into a tibble, and so I can't run the fabletools::forecast() function on it. I tried converting it to a mable on the fly in the grouping, but somehow that failed.

Just an FYI.

Fredo-XVII avatar Jul 01 '21 17:07 Fredo-XVII

@mitchelloharawild Hi! I have concluded all different types of furrr::options() on our kubernetes cluster and furrr fails when I try to build the forecast for all 2000 groups. The code scales for a hand full of stores however, so it does work. There is no combination of options that I found that resolves this. I also tested the using cluster option with parallel package instead of multisession and that also failed. If you guys figure it out, let me know.

Thanks!

Fredo-XVII avatar Jul 19 '21 17:07 Fredo-XVII

@Fredo-XVII , do you happen to have information from when you were running that about resource usage? I'm just wondering if it's a matter of it being RAM- or core-bound, or something a bit more pernicious. I've often been RAM-bound when using future and that is a tricky one to fix, but a relatively easy one to diagnose.

davidtedfordholt avatar Jul 20 '21 14:07 davidtedfordholt

@davidtedfordholt Yes, I was able to see the resources. The container has access to 32 cores and 93GB's of RAM. Let me tell you that it never came close to using all that memory, even when all cores where firing.

The container system is running on Linx Ubuntu 10 I believe, R 3.6.3, with a rocker/tidyverse:3.6.3. I do not use base R docker because the tidyverse version ensures that Java works inside the container.

For sequential: 1 core used.

For mutlisession(workers = 28, with furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = 1): Only 2 cores were firing. Code finished in 18hrs.

For mutlisession(workers = 28, with furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = Inf): All cores fire but it crashes the container within minutes.

For mutlisession(workers = cluster, with furrr::furrr_options(seed = FALSE, lazy = FALSE, chunk_size = NULL, scheduling = Inf). cluster object from parallel package. All cores fire but it crashes the container within minutes.

The main difference that I saw with multidplyr is that the setting for multidplyr the cores where set to no-restore for all cores, but for furrr and future the cores were set to --no-restore but also as --slave.

No iteration of the furrr options made the code run for all the 2000 groups. Having the scheduling = Inf options was the only way to get mutlisession to fire on all cores.

Below is an example of my last run but I am not sure exactly what version of the code above I was using. You can see there is a PID 1 issue I can't resolve because I am not a computer scientist, and all my search has resulted in no solution.

Also, you can see that the multidplyr code for the model fit finished in 1.2 hours for 2000 groups, and is associated with the cores with only --no-restore. The future + furrr forecast build below, or --slave cores, kicks off 3 cores and runs for only 4 min before crashing.

Running the model fit with future + furrr on all stores also crashes the container, but because I was interested in parallelizing the forecast build after the model fit, I kept multidplyr in my test below for the model fit.

Let me know if you have other questions.

Thanks for the help!!

Message + ERROR:

Step #4: Clustering Parellel Code Begins
Number of cores used: 28
[1] "Models Dataset built: mature_fx_cluster"
Total Time for Forecast Model Build: 1.2633 hours
[1] "Forecasts Models Dataset (Mable) Built, includes ensemble: fit_str_fx"
[1] "End of Cluster Code"
Step #5: Build the future 52 week store forecasts
Number of cores used: 28
FutureStrategytweakedmultisessionclustermultiprocessfuturefunction
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
<simpleError in serialize(data, node$con, xdr = FALSE): error writing to connection>
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
PID   USER     TIME  COMMAND
    1 root      0:06 /runtime-connector Rscript main.R
   20 root      1h09 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 3231 root     59:16 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3270 root      1h21 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3309 root      1h06 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3348 root      1h09 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3387 root      1h06 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3465 root      1h08 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3504 root     57:57 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3543 root      1h10 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3582 root      1h04 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3621 root     58:24 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3660 root      1h27 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3699 root      1h22 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3738 root     54:26 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3777 root      1h08 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3816 root     57:41 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3855 root      1h09 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3894 root      1h12 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3933 root      1h00 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 3972 root      1h05 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 4050 root      1h05 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 4089 root     59:38 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 4128 root     55:39 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 4245 root      1h15 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 4284 root      1h13 /usr/local/lib/R/bin/exec/R --no-save --no-restore --no-re
 5455 root      4:04 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5516 root      4:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5577 root      4:04 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5638 root      0:06 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5699 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5760 root      0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5821 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5882 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 5943 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6004 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6065 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6126 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6187 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6248 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6309 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6370 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6431 root      0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6492 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6553 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6614 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6675 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6736 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6797 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6858 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6919 root      0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 6980 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 7041 root      0:02 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 7102 root      0:03 /usr/local/lib/R/bin/exec/R --no-save --no-restore --slave
 7209 root      0:00 /tmp/bins/busybox ps

Fredo-XVII avatar Jul 20 '21 14:07 Fredo-XVII

It looks like you're on a system that doesn't use systemd as the init system. May I ask what operating system are you running this on, @Fredo-XVII ?

davidtedfordholt avatar Jul 22 '21 13:07 davidtedfordholt