simr
simr copied to clipboard
parallel support
There's a bunch of things that will have to be sorted out to get polished parallel support.
- cross-platform. That means snow-type parallel need to be implemented (i.e. we have to export stuff to the workers)
- but maybe a fork option for people on Unix? Fork will be better for user-defined stuff.
- automagic. Users should be able to do
parallel=TRUE
and it should Just Work. -
parallel=TRUE
will have to choose a default number of cores somehow. - let user specify cores with
parallel=n
?- everything else will still have to be automagic.
- arbitrary backend with
parallel=cl
?- we should still export
simr
in this case
- we should still export
- random number seeds.
- seeds should work with the parallel option
- ideally results will be the same for the same seed, but we might have to accept e.g. same seed + same # of cores
- maybe worry that exporting loading
simr
in each worker will use the installed version, not the version we're working on (will this mess withtestthat
too?) -
lme4
versus other packages and user defined functions.- leave it up to users to set up a cluster with the right things exported
- export everything - if the job's big enough to make parallel overhead won't matter so much
And progress bars of course. Apparently we can do this in doSNOW
at least?
I think a nice middle ground here might be to support parallel computation the same way plyr
does -- take more-or-less seamless advantage of any parallel backend available, but require the user to actually set up the parallel backend (usually via registerDoParallel()
. Otherwise, we have to deal with the all the configuration options internally, which adds a lot of overhead.
One special case could possibly be done for parallel=n
, where if there is no registered parallel backend, we do something like this at the start of the computation:
if(parallel > 1){
if(!require('doParallel')){
warning("Parallel computation requested but doParallel not available")
}else{
if(isnull(getDoParName)){
message(paste0("Parallel computation requested, starting doParallel with ",parallel," cores))
registerDoParallel(cores=parallel)
}
}
For parallel=TRUE
and parallel=FALSE
, we simply pass those arguments onto llply
and rely on the users to have set up an appropriate backend ( via doParallel
, doSNOW
,doMC
, etc.). Otherwise, we have to test against all the possible backends. Or maybe autoconfigure on parallel=TRUE
and trust the user for parallel='custom'
or some other magic value? If the user is already in the position to define a proper cluster cl
, surely they can also run registerDoParallel(cl)
, so why should simr
add that extra logic internally? Especially since we don't know if the user wants to continue using that cluster afterwards, i.e. whether simr
needs to tear down the cluster.
Exporting simr
seems to export the currently "installed" one, but this isn't too big a problem if you're testing via "build and reload" in RStudio, which installs the development version in the user-local library. In terms of the other package exports -- those are handled automatically by simr
's dependencies, at least for simr
-internal functionality. doParallel
also has some functionality to export basically the entire R environment of the master process, but that could be quite slow if the user has large datasets or several large models loaded.
Random seeds are quite difficult and work very differently in the different backends. Moreover, there is some interaction between the random seed for the master and the seeds for the workers (even with setting the worker random seed, you get different results for different master random seeds).
Progress bars will be difficult no matter what -- plyr
just gave up all hope and flat out doesn't support them for parallel computation. And currently, parallel is done via plyr::llply()
Thinking about this more: I think parallel=TRUE
should use pass that argument untouched onto llply
while parallel='auto'
will attempt to set up an appropriate backend using distributed memory via doParallel
, if doParallel
is installed.
Is there a simple way in which users can make use of multiple cores as long as parallel support isn't part of simr?
In my fork, there is some parallel support, but until it's integrated with upstream, the API is subject to change. There's also support for testing multiple coefficients / terms per simulation iteration, which is a huge boost for larger models.
I haven't had a chance to work on it in a while, but if somebody else is using it and thus "helping" work out any kinks/bugs/awkwardness, even just by using it on additional datasets, then that would certainly help move development along....
Finally, you can abuse simr a bit and extract the results of individual simulations and then compute your power (including CIs) from that. So that means you can run n simr runs of m/n iterations simultaneously on m cores to compute m simulated iterations. There's two things to watch out for here:
-
Make sure to set your random seed appropriately so that you don't have m identical copies of the n runs, which would be misleading.
-
Parallel operation on a mulitcore machine may not speed things up as much as you expect if your BLAS is properly tuned for mulitcore operations because the parallel instances then compete with each other. (Now, if you have a cluster then you can easily do both levels of parallel operation with SNOW.)
The biggest boost I've seen for my stuff comes from the optimised omnibus testing for large models in my fork.There does however seem to be a weird interaction with some of that code and certain lme4 versions though that kills performance, at least on Linux. As far as I can tell, it's something related to memory allocation. But again, if someone is willing to use the code on their own stuff and report any issues, this will certainly help me fix them!
Hi @palday thanks for your input. This is really useful. One clarification question, though, because I'm not an expert for BLAS: If the calculations can be parallelized on the level of BLAS, why is parallel support needed in simr at all?
You're starting to grasp why this project hasn't been high priority for me. :-)
Parallel support would still be useful for clusters and single machines with many processors, e. g. those with multiple processors, each with multiple cores. This all ties into issues with memory performance and locality, which is a much larger topic.
Using a good BLAS will also speed up many other things in R.
The default BLAS on macOS is pretty good. Revolution R is a bit controversial as a commercial enterprise, but their version of R is built against the Intel MKL, which is perhaps the fastest BLAS for Intel processors. On Linux, I find OpenBLAS to be quite good, but you need to make sure that R is using it via update-alternatives or otherwise setting your library path.
Clusters -- of course. Thanks again. This info is going to be useful for a lot of people in my research group.
@pitakakariki @palday @tmalsburg ; I have been using simr for some heavy simulations, and struggled with using the parallel package, due to the fact that all objects used by compute nodes had to be explicitly exported. I figured out that the future package solves all these low-level considerations, and even lets us use exactly the same code, regardless of whether the code is run sequentially, in parallel, or on some external server.
Given the discussion above, I would like to share some example code, which perhaps could be useful when considering how to implement parallelization in simr.
Below is an example illustrating the solution I used, which also shows the speedup on my MacBook Pro.
suppressMessages({library(lme4)
library(simr)
library(future)
library(future.apply)
library(parsimr)
library(purrr) # for map()
library(Hmisc) # for binconf()
library(dplyr) # For tibble, %>%, ...
})
# Number of simulations
nsim <- 100L
# Example mixed model
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
fixef(fm1)
#> (Intercept) Days
#> 251.40510 10.46729
fixef(fm1)["Days"] <- 10
# User powerSim function
system.time(test0 <- powerSim(fm1, nsim = nsim, progress = FALSE))
#> singular fit
#> user system elapsed
#> 12.325 0.255 12.594
# Function for repeatedly calling powerSim with nsim = 1
ps_function <- function(...) powerSim(..., nsim = 1)
# Run in sequential using future
plan(sequential)
system.time(test1 <- future_replicate(nsim, ps_function(fit = fm1, progress = FALSE), simplify = FALSE))
#> user system elapsed
#> 12.903 0.217 13.126
# Run in parallel using future
plan(multiprocess)
system.time(test2 <- future_replicate(nsim, ps_function(fit = fm1, progress = FALSE), simplify = FALSE))
#> singular fit
#> user system elapsed
#> 0.106 0.088 2.985
# test0 contains one powerSim object.
# test1 and test2 contain lists of powerSim objects.
# These must be combined in order to get the right results
# Here is what we get for test0
summary(test0)
#> successes trials mean lower upper
#> 1 100 100 1 0.9637833 1
# Let us reproduce it
# Set significance threshold
alpha <- 0.05
# Extract p-values
pvals <- map_dbl(test2, ~ .x$pval)
# Get the confidence interval
tibble(
successes = sum(pvals < alpha),
trials = length(pvals),
mean = successes / trials
) %>%
bind_cols(as_tibble(matrix(binconf(sum(.$successes), n = length(pvals), method = "exact")[, c("Lower", "Upper")],
nrow = 1, dimnames = list(NULL, c("Lower", "Upper")))))
#> # A tibble: 1 x 5
#> successes trials mean Lower Upper
#> <int> <int> <dbl> <dbl> <dbl>
#> 1 100 100 1 0.964 1
Created on 2019-02-18 by the reprex package (v0.2.1)
I had also recently run across the future
package in a different package, but haven't gotten around to checking out how easy it is to use / to integrate into simr
internals. Good to know that it's apparently not that hard to use -- I'll take another look when my day job permits.
future
and especially doFuture
do provide a nice, unified way to handle parallel simulation across all sorts of parallel architectures that would be too onerous to implement to simr .... but it would still not provide for progress bars, which is something @pitakakariki understandably really wants.
A bit of indirection might help, but would require more effort to implement: the call llply
would be treated as serial, but each function call/chunk would contain $O(n_workers)$ simulations. This is less than optimal because some workers might sit idyll if others finish faster and could not be rescheduled until the next chunk is called but it would be better than nothing. We could combine this with some logic around the progress bar options such that if the progress bar is shown, we use this "semi parallel" mode, and if it is not shown, we use "full parallel" mode.
furrr
, which I as far as I know is a kind of "tidyverse future", also supports progress bars, so that might also be an option.
So @osorensen solution using future is very interesting and helpful. However, I keep getting a weird error that I have no idea why it arises. Any insights are most welcome.
I am fitting the following model
model.of.interest.1 <- glmer(Y ~ X * Z + (1 | G), offset = log(offset), data = df2, family = poison)
And then following @osorensen suggestion on using the package future
number of simulations
nsim <- 1000L
Function for repeatedly calling powerSim with nsim = 1
ps_function <- function(...) powerSim(..., nsim = 1)
plan(multisession) test1 <- future_replicate(nsim, ps_function(model.of.interest.1, test = fixed('XYes:Z', 'z')), simplify = FALSE)
And the interesting thing is that all the powerSim in my list test1 compute a power of 0 and return the following error
test1[[1]]$errors stage index message 1 Simulating 1 object 'df2' not found
Any idea why it is not finding the dataset df2 to simulate over?
Thank you
Haven't done much parallel work but I think future
needs a bit of setup (unless your OS allows forks), e.g. telling it which objects you want available in the processes it spawns.
@pitakakariki is correct. @lesegura lme4 creates a reference to the dataset in question using the magic of "environments" in R, so that it has a local copy that only actually copies if there is a change to the original dataset. But this type of reference isn't passed cleanly when passing the fitted model, which is why manipulations of that model in a future fail.
I had worked on parallelizing this a while back and had a branch in my fork, but I haven't updated it in a while (and don't do much R development these days because I've been focused on MixedModels.jl, where oddly enough, I've again been involved in parallelizing code ....). Maybe there's a low effort way I can get these changes back into the main line of simr
, but I'll need to coorindate with @pitakakariki a bit on the API.
Thanks @palday. Going to link to https://github.com/palday/simr here so I know where to look when I make time for this.
simr is well overdue for an overhaul, and I'm going to request a substantial amount of time to dedicate to this next FY.
Just wondering: is parallelization still under development?
Not under active development - it's reasonably high on the "wish list" but I wouldn't want to hazard a guess to when I'll find the time.
Thank you for the quick response. Fwiw, if there are others with similar questions, here's what I did to work around that as a quick hack for my specific GLMM analysis:
my_power_sim <- function(model, nsim = 2, nworkers = min(4, parallel::detectCores() - 1)) {
require(tidyverse)
require(foreach)
require(simr)
require(lme4)
require(broom.mixed)
workers <- parallel::makeCluster(nworkers, type = "FORK")
doParallel::registerDoParallel(cl = workers)
d.test <- model@frame # necessary because getData() seems to be broken for simr
d <- plyr::ldply(
.data = as.list(1:nsim),
.fun = function(.n) {
# Set separate random seed in each call (otherwise results are all identical)
set.seed(.n)
y <- doSim(object = model)
m <- suppressWarnings(doFit(y, fit = model))
s <- tidy(m) %>%
select(-c(effect, group)) %>%
filter(term %in% c("vot1.sc", "context.numeric", "context.numeric:group.numeric")) %>%
mutate(n = .n, singular = isSingular(m), convergence = m@optinfo$conv$opt)
return(s)
},
.parallel = T,
.paropts = list(.packages = c("tidyverse", "lme4", "broom.mixed", "simr"), .export = c("d.test"), .verbose = F),
.progress = plyr::progress_text(char = ".")
)
return(d)
}
Thank you for the quick response. Fwiw, if there are others with similar questions, here's what I did to work around that as a quick hack for my specific GLMM analysis:
my_power_sim <- function(model, nsim = 2, nworkers = min(4, parallel::detectCores() - 1)) { require(tidyverse) require(foreach) require(simr) require(lme4) require(broom.mixed) workers <- parallel::makeCluster(nworkers, type = "FORK") doParallel::registerDoParallel(cl = workers) d.test <- model@frame # necessary because getData() seems to be broken for simr d <- plyr::ldply( .data = as.list(1:nsim), .fun = function(.n) { # Set separate random seed in each call (otherwise results are all identical) set.seed(.n) y <- doSim(object = model) m <- suppressWarnings(doFit(y, fit = model)) s <- tidy(m) %>% select(-c(effect, group)) %>% filter(term %in% c("vot1.sc", "context.numeric", "context.numeric:group.numeric")) %>% mutate(n = .n, singular = isSingular(m), convergence = m@optinfo$conv$opt) return(s) }, .parallel = T, .paropts = list(.packages = c("tidyverse", "lme4", "broom.mixed", "simr"), .export = c("d.test"), .verbose = F), .progress = plyr::progress_text(char = ".") ) return(d) }
Hi all, I've been trying to adapt a function to my design, but I'm getting identical results for different sample sizes. Wonder if anyone could tell me what I am doing wrong in my code (see below). I don't know if it would have another way to set separate random seed for each call.
All the best, Victor
run_sim <- function(mod_sim, npart, nstim, nsim, alpha, teff, globals = NULL) {
n_workers <- availableCores()
pairs <- expand.grid(npart, nstim)
results <- data.frame(N_part = numeric(),
N_stim = numeric(),
Result = numeric(),
Lower = numeric(),
Upper = numeric(),
stringsAsFactors = FALSE)
for (i in 1:nrow(pairs)) {
pair <- pairs[i, ]
value1 <- pair[[1]]
value2 <- pair[[2]]
# Generate a random seed for each iteration
seed <- sample(1:1000, 1)
plan(multisession, workers = n_workers)
# Use the correct random seed in powerSim function call
pstests <- future_replicate(nsim, powerSim(mod_sim, nsim = 1, test = fixed(teff, "t"), seed = seed),
future.globals = globals,
progress = FALSE, simplify = FALSE)
plan(sequential)
pvals <- lapply(pstests, function(x) x$pval)
successes <- sum(unlist(pvals) < alpha)
trials <- length(pvals)
mean_p <- successes / trials
conf_interval <- binconf(successes, n = trials, method = "exact")[, c("Lower", "Upper")]
ci_lower <- conf_interval["Lower"]
ci_upper <- conf_interval["Upper"]
new_row <- data.frame(N_part = value1,
N_stim = value2,
Result = mean_p,
Lower = ci_lower,
Upper = ci_upper,
stringsAsFactors = FALSE)
results <- rbind(results, new_row)
}
return(results)
}
Hi @victorshiramizu, I'll have a proper look soon-ish.
I haven't used futures before (I like the look of it though) but with that caveat my guess is that you're right and seed
is not being iterated through, with powerSim
repeatedly looking at seed[1]
.
If you don't use seed
at all does it give you more sensible (albeit not reproducible) results?
Edit: that shouldn't explain different sample sizes giving the same power though ... did you intend to use pairs
and extend
to increase the sample size in model_sim
?
Hi Peter,
thank you for your reply.
I'm using pairs
only to summarize the results. The model was already extended using the extend
function.
Whether the seed
is used or not, the output remains relatively unchanged.
Here's what I'm getting:
Looks like there are three issues:
- I think you're calling
run_sim
withnsim=1
- possibly this was for testing? - If you call
run_sim
with a largernsim
, you'll still get power of either 0% or 100% because each iteration is using the same seed. I think you need to setfuture.seed
instead? - As far as I can tell the same model object
mod_sim
is being used for every sample size. The code is looping through different values forN_part
but I can't see where it's changingmod_sim
to reflect this?