parameters
parameters copied to clipboard
allow parallel computation during bootstrapping
This requires adding a new parallel
argument to model_parameters
and then passing the value to boot
calls:
For example, here we can add parallel = parallel
inside the call:
https://github.com/easystats/parameters/blob/d7fed242cc655d88c0417f17f48d15c657b67b2b/R/bootstrap_model.R#L85
We can also default to parallel = "multicore"
, so multiple cores - if available - are used by default.
wouldn't it be better to let that be passed throug ellipsis to avoid cluttering the API? Or to retrieve it from the options (as stan does) ?
Can't create a reprex because parallel doesn't seem to work with it. But passing the dots works (PR: #439).
> set.seed(123)
> library(parameters)
>
> mod <- lm(formula = wt ~ mpg, data = mtcars)
>
> set.seed(123)
> system.time(model_parameters(mod, bootstrap = TRUE, iterations = 1000, parallel = "no"))
user system elapsed
1.043 0.007 1.057
>
> set.seed(123)
> system.time(
+ model_parameters(
+ mod,
+ bootstrap = TRUE,
+ iterations = 1000,
+ parallel = "multicore",
+ ncpus = 4L
+ )
+ )
user system elapsed
0.078 0.056 0.613
"multicore" doesn't work on windows.
Using normal R, or Microsoft R Open doesn't seem to make a difference, increasing used CPUs even slows down:
library(parameters)
#> Warning: Paket 'parameters' wurde unter R Version 4.0.4 erstellt
model <- lm(mpg ~ wt + cyl, data = mtcars)
microbenchmark::microbenchmark(
model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "snow", ncpus = 4),
times = 5
)
#> Unit: seconds
#> expr
#> model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "snow", ncpus = 4)
#> min lq mean median uq max neval
#> 2.146296 2.178574 2.18241 2.179772 2.200774 2.206634 5
microbenchmark::microbenchmark(
model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "no", ncpus = 4),
times = 5
)
#> Unit: seconds
#> expr
#> model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "no", ncpus = 4)
#> min lq mean median uq max neval
#> 1.120941 1.12849 1.132289 1.128846 1.137772 1.145394 5
microbenchmark::microbenchmark(
model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "multicore", ncpus = 4),
times = 5
)
#> Unit: seconds
#> expr
#> model_parameters(model, bootstrap = TRUE, iterations = 1000, parallel = "multicore", ncpus = 4)
#> min lq mean median uq max neval
#> 1.102907 1.10788 1.117547 1.114816 1.12571 1.136424 5
Created on 2021-03-09 by the reprex package (v1.0.0)
Yeah, I am also seeing the same on my Mac that the computation time actually increases if I use parallel computing with ncpus
set to some value > 1.
It's all a bit confusing. And this has nothing to do with parameters
functions.
Here is an example from the boot
package docs:
library(boot)
library(microbenchmark)
# usual bootstrap of the ratio of means using the city data
ratio <- function(d, w) sum(d$x * w) / sum(d$u * w)
set.seed(123)
microbenchmark::microbenchmark(
boot(city, ratio, R = 4999, stype = "w"),
times = 5
)
#> Unit: milliseconds
#> expr min lq mean median
#> boot(city, ratio, R = 4999, stype = "w") 30.76705 36.27656 39.59618 40.73334
#> uq max neval
#> 42.90163 47.30233 5
options(boot.parallel = "multicore")
set.seed(123)
microbenchmark::microbenchmark(
boot(city, ratio, R = 4999, stype = "w", ncpus = 5),
times = 5
)
#> Unit: milliseconds
#> expr min lq mean
#> boot(city, ratio, R = 4999, stype = "w", ncpus = 5) 44.64621 47.21875 51.9313
#> median uq max neval
#> 48.56907 50.58117 68.6413 5
Created on 2021-03-10 by the reprex package (v1.0.0)
I think we should stay away from making any changes to parameters
until we figure out how to successfully use boot
's parallel computation functionality.
Yes, sounds good.
@bwiernik Do you have any ideas about how to get this to work?
Yeah, I can take a look
future
is probably a better platform for cross-platform parallel computation: https://cran.r-project.org/web/packages/future/index.html
The examples in this thread are probably all too small (OLS with N=32), so the parallel overhead is heavier than the gains.
Perhaps one strategy would be for us to support extracting results from boot
and other bootstrap objects. That way, users who want fancy features like parallel computation can use the existing support in the appropriate package, and we can extract and display the estimates.
One of the major benefits of parameters is that we provide a simple interface for bootstrapping that otherwise are really difficult for new users (learning to use the boot package is a nightmare). I agree that we should use future for parallelization, but I do think we should support it.
You're right. boot
is kind of a nightmare to learn.