simr icon indicating copy to clipboard operation
simr copied to clipboard

parallel support

Open pitakakariki opened this issue 8 years ago • 23 comments

There's a bunch of things that will have to be sorted out to get polished parallel support.

  • cross-platform. That means snow-type parallel need to be implemented (i.e. we have to export stuff to the workers)
    • but maybe a fork option for people on Unix? Fork will be better for user-defined stuff.
  • automagic. Users should be able to do parallel=TRUE and it should Just Work.
  • parallel=TRUE will have to choose a default number of cores somehow.
  • let user specify cores with parallel=n?
    • everything else will still have to be automagic.
  • arbitrary backend with parallel=cl?
    • we should still export simr in this case
  • random number seeds.
    • seeds should work with the parallel option
    • ideally results will be the same for the same seed, but we might have to accept e.g. same seed + same # of cores
  • maybe worry that exporting loading simr in each worker will use the installed version, not the version we're working on (will this mess with testthat too?)
  • lme4 versus other packages and user defined functions.
    • leave it up to users to set up a cluster with the right things exported
    • export everything - if the job's big enough to make parallel overhead won't matter so much

pitakakariki avatar Jun 22 '16 05:06 pitakakariki

And progress bars of course. Apparently we can do this in doSNOW at least?

pitakakariki avatar Jun 22 '16 05:06 pitakakariki

I think a nice middle ground here might be to support parallel computation the same way plyr does -- take more-or-less seamless advantage of any parallel backend available, but require the user to actually set up the parallel backend (usually via registerDoParallel(). Otherwise, we have to deal with the all the configuration options internally, which adds a lot of overhead.

One special case could possibly be done for parallel=n, where if there is no registered parallel backend, we do something like this at the start of the computation:

if(parallel > 1){
    if(!require('doParallel')){
        warning("Parallel computation requested but doParallel not available")   
    }else{
       if(isnull(getDoParName)){
       message(paste0("Parallel computation requested, starting doParallel with ",parallel," cores))
      registerDoParallel(cores=parallel)   
     }
}

For parallel=TRUE and parallel=FALSE, we simply pass those arguments onto llply and rely on the users to have set up an appropriate backend ( via doParallel, doSNOW,doMC, etc.). Otherwise, we have to test against all the possible backends. Or maybe autoconfigure on parallel=TRUE and trust the user for parallel='custom' or some other magic value? If the user is already in the position to define a proper cluster cl, surely they can also run registerDoParallel(cl), so why should simr add that extra logic internally? Especially since we don't know if the user wants to continue using that cluster afterwards, i.e. whether simr needs to tear down the cluster.

Exporting simr seems to export the currently "installed" one, but this isn't too big a problem if you're testing via "build and reload" in RStudio, which installs the development version in the user-local library. In terms of the other package exports -- those are handled automatically by simr's dependencies, at least for simr-internal functionality. doParallel also has some functionality to export basically the entire R environment of the master process, but that could be quite slow if the user has large datasets or several large models loaded.

Random seeds are quite difficult and work very differently in the different backends. Moreover, there is some interaction between the random seed for the master and the seeds for the workers (even with setting the worker random seed, you get different results for different master random seeds).

Progress bars will be difficult no matter what -- plyr just gave up all hope and flat out doesn't support them for parallel computation. And currently, parallel is done via plyr::llply()

palday avatar Jun 23 '16 02:06 palday

Thinking about this more: I think parallel=TRUE should use pass that argument untouched onto llply while parallel='auto' will attempt to set up an appropriate backend using distributed memory via doParallel, if doParallel is installed.

palday avatar Jul 06 '16 03:07 palday

Is there a simple way in which users can make use of multiple cores as long as parallel support isn't part of simr?

tmalsburg avatar Jan 16 '18 08:01 tmalsburg

In my fork, there is some parallel support, but until it's integrated with upstream, the API is subject to change. There's also support for testing multiple coefficients / terms per simulation iteration, which is a huge boost for larger models.

I haven't had a chance to work on it in a while, but if somebody else is using it and thus "helping" work out any kinks/bugs/awkwardness, even just by using it on additional datasets, then that would certainly help move development along....

Finally, you can abuse simr a bit and extract the results of individual simulations and then compute your power (including CIs) from that. So that means you can run n simr runs of m/n iterations simultaneously on m cores to compute m simulated iterations. There's two things to watch out for here:

  1. Make sure to set your random seed appropriately so that you don't have m identical copies of the n runs, which would be misleading.

  2. Parallel operation on a mulitcore machine may not speed things up as much as you expect if your BLAS is properly tuned for mulitcore operations because the parallel instances then compete with each other. (Now, if you have a cluster then you can easily do both levels of parallel operation with SNOW.)

The biggest boost I've seen for my stuff comes from the optimised omnibus testing for large models in my fork.There does however seem to be a weird interaction with some of that code and certain lme4 versions though that kills performance, at least on Linux. As far as I can tell, it's something related to memory allocation. But again, if someone is willing to use the code on their own stuff and report any issues, this will certainly help me fix them!

palday avatar Jan 16 '18 09:01 palday

Hi @palday thanks for your input. This is really useful. One clarification question, though, because I'm not an expert for BLAS: If the calculations can be parallelized on the level of BLAS, why is parallel support needed in simr at all?

tmalsburg avatar Jan 16 '18 09:01 tmalsburg

You're starting to grasp why this project hasn't been high priority for me. :-)

Parallel support would still be useful for clusters and single machines with many processors, e. g. those with multiple processors, each with multiple cores. This all ties into issues with memory performance and locality, which is a much larger topic.

Using a good BLAS will also speed up many other things in R.

The default BLAS on macOS is pretty good. Revolution R is a bit controversial as a commercial enterprise, but their version of R is built against the Intel MKL, which is perhaps the fastest BLAS for Intel processors. On Linux, I find OpenBLAS to be quite good, but you need to make sure that R is using it via update-alternatives or otherwise setting your library path.

palday avatar Jan 16 '18 09:01 palday

Clusters -- of course. Thanks again. This info is going to be useful for a lot of people in my research group.

tmalsburg avatar Jan 16 '18 09:01 tmalsburg

@pitakakariki @palday @tmalsburg ; I have been using simr for some heavy simulations, and struggled with using the parallel package, due to the fact that all objects used by compute nodes had to be explicitly exported. I figured out that the future package solves all these low-level considerations, and even lets us use exactly the same code, regardless of whether the code is run sequentially, in parallel, or on some external server.

Given the discussion above, I would like to share some example code, which perhaps could be useful when considering how to implement parallelization in simr.

Below is an example illustrating the solution I used, which also shows the speedup on my MacBook Pro.

suppressMessages({library(lme4)
library(simr)
library(future)
library(future.apply)
library(parsimr)
library(purrr) # for map()
library(Hmisc) # for binconf()
library(dplyr) # For tibble, %>%, ...
})
# Number of simulations
nsim <- 100L
# Example mixed model
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
fixef(fm1)
#> (Intercept)        Days 
#>   251.40510    10.46729
fixef(fm1)["Days"] <- 10
# User powerSim function
system.time(test0 <- powerSim(fm1, nsim = nsim, progress = FALSE))
#> singular fit
#>    user  system elapsed 
#>  12.325   0.255  12.594

# Function for repeatedly calling powerSim with nsim = 1
ps_function <- function(...) powerSim(..., nsim = 1)

# Run in sequential using future
plan(sequential)
system.time(test1 <- future_replicate(nsim, ps_function(fit = fm1, progress = FALSE), simplify = FALSE))
#>    user  system elapsed 
#>  12.903   0.217  13.126
# Run in parallel using future
plan(multiprocess)
system.time(test2 <- future_replicate(nsim, ps_function(fit = fm1, progress = FALSE), simplify = FALSE))
#> singular fit
#>    user  system elapsed 
#>   0.106   0.088   2.985

# test0 contains one powerSim object.
# test1 and test2 contain lists of powerSim objects.
# These must be combined in order to get the right results


# Here is what we get for test0
summary(test0)
#>   successes trials mean     lower upper
#> 1       100    100    1 0.9637833     1

# Let us reproduce it
# Set significance threshold
alpha <- 0.05
# Extract p-values
pvals <- map_dbl(test2, ~ .x$pval)
# Get the confidence interval
tibble(
  successes = sum(pvals < alpha),
  trials = length(pvals),
  mean = successes / trials
  ) %>%
  bind_cols(as_tibble(matrix(binconf(sum(.$successes), n = length(pvals), method = "exact")[, c("Lower", "Upper")],
                             nrow = 1, dimnames = list(NULL, c("Lower", "Upper")))))
#> # A tibble: 1 x 5
#>   successes trials  mean Lower Upper
#>       <int>  <int> <dbl> <dbl> <dbl>
#> 1       100    100     1 0.964     1

Created on 2019-02-18 by the reprex package (v0.2.1)

osorensen avatar Feb 18 '19 10:02 osorensen

I had also recently run across the future package in a different package, but haven't gotten around to checking out how easy it is to use / to integrate into simr internals. Good to know that it's apparently not that hard to use -- I'll take another look when my day job permits.

palday avatar Feb 18 '19 10:02 palday

future and especially doFuture do provide a nice, unified way to handle parallel simulation across all sorts of parallel architectures that would be too onerous to implement to simr .... but it would still not provide for progress bars, which is something @pitakakariki understandably really wants.

A bit of indirection might help, but would require more effort to implement: the call llply would be treated as serial, but each function call/chunk would contain $O(n_workers)$ simulations. This is less than optimal because some workers might sit idyll if others finish faster and could not be rescheduled until the next chunk is called but it would be better than nothing. We could combine this with some logic around the progress bar options such that if the progress bar is shown, we use this "semi parallel" mode, and if it is not shown, we use "full parallel" mode.

palday avatar Feb 28 '19 12:02 palday

furrr, which I as far as I know is a kind of "tidyverse future", also supports progress bars, so that might also be an option.

osorensen avatar Feb 28 '19 12:02 osorensen

So @osorensen solution using future is very interesting and helpful. However, I keep getting a weird error that I have no idea why it arises. Any insights are most welcome.

I am fitting the following model

model.of.interest.1 <- glmer(Y ~ X * Z + (1 | G), offset = log(offset), data = df2, family = poison)

And then following @osorensen suggestion on using the package future

number of simulations

nsim <- 1000L

Function for repeatedly calling powerSim with nsim = 1

ps_function <- function(...) powerSim(..., nsim = 1)

plan(multisession) test1 <- future_replicate(nsim, ps_function(model.of.interest.1, test = fixed('XYes:Z', 'z')), simplify = FALSE)

And the interesting thing is that all the powerSim in my list test1 compute a power of 0 and return the following error

test1[[1]]$errors stage index message 1 Simulating 1 object 'df2' not found

Any idea why it is not finding the dataset df2 to simulate over?

Thank you

lesegura avatar Jun 04 '21 18:06 lesegura

Haven't done much parallel work but I think future needs a bit of setup (unless your OS allows forks), e.g. telling it which objects you want available in the processes it spawns.

pitakakariki avatar Jun 08 '21 04:06 pitakakariki

@pitakakariki is correct. @lesegura lme4 creates a reference to the dataset in question using the magic of "environments" in R, so that it has a local copy that only actually copies if there is a change to the original dataset. But this type of reference isn't passed cleanly when passing the fitted model, which is why manipulations of that model in a future fail.

I had worked on parallelizing this a while back and had a branch in my fork, but I haven't updated it in a while (and don't do much R development these days because I've been focused on MixedModels.jl, where oddly enough, I've again been involved in parallelizing code ....). Maybe there's a low effort way I can get these changes back into the main line of simr, but I'll need to coorindate with @pitakakariki a bit on the API.

palday avatar Jun 09 '21 04:06 palday

Thanks @palday. Going to link to https://github.com/palday/simr here so I know where to look when I make time for this.

simr is well overdue for an overhaul, and I'm going to request a substantial amount of time to dedicate to this next FY.

pitakakariki avatar Jun 09 '21 04:06 pitakakariki

Just wondering: is parallelization still under development?

tfjaeger avatar Dec 16 '21 16:12 tfjaeger

Not under active development - it's reasonably high on the "wish list" but I wouldn't want to hazard a guess to when I'll find the time.

pitakakariki avatar Dec 16 '21 20:12 pitakakariki

Thank you for the quick response. Fwiw, if there are others with similar questions, here's what I did to work around that as a quick hack for my specific GLMM analysis:

my_power_sim <- function(model, nsim = 2, nworkers = min(4, parallel::detectCores() - 1)) {
  require(tidyverse)
  require(foreach)
  require(simr)
  require(lme4)
  require(broom.mixed)

  workers <- parallel::makeCluster(nworkers, type = "FORK")
  doParallel::registerDoParallel(cl = workers)
  d.test <- model@frame # necessary because getData() seems to be broken for simr
  
  d <- plyr::ldply(
    .data = as.list(1:nsim),
    .fun = function(.n) {
      # Set separate random seed in each call (otherwise results are all identical)
      set.seed(.n)
      y <- doSim(object = model)
      m <- suppressWarnings(doFit(y, fit = model))
      s <- tidy(m) %>%
        select(-c(effect, group)) %>%
        filter(term %in% c("vot1.sc", "context.numeric", "context.numeric:group.numeric")) %>%
        mutate(n = .n, singular = isSingular(m), convergence = m@optinfo$conv$opt)
      return(s)
    }, 
    .parallel = T,
    .paropts = list(.packages = c("tidyverse", "lme4", "broom.mixed", "simr"), .export = c("d.test"), .verbose = F),
    .progress = plyr::progress_text(char = ".")
  )
  
  return(d)
}

tfjaeger avatar Dec 16 '21 20:12 tfjaeger

Thank you for the quick response. Fwiw, if there are others with similar questions, here's what I did to work around that as a quick hack for my specific GLMM analysis:

my_power_sim <- function(model, nsim = 2, nworkers = min(4, parallel::detectCores() - 1)) {
  require(tidyverse)
  require(foreach)
  require(simr)
  require(lme4)
  require(broom.mixed)

  workers <- parallel::makeCluster(nworkers, type = "FORK")
  doParallel::registerDoParallel(cl = workers)
  d.test <- model@frame # necessary because getData() seems to be broken for simr
  
  d <- plyr::ldply(
    .data = as.list(1:nsim),
    .fun = function(.n) {
      # Set separate random seed in each call (otherwise results are all identical)
      set.seed(.n)
      y <- doSim(object = model)
      m <- suppressWarnings(doFit(y, fit = model))
      s <- tidy(m) %>%
        select(-c(effect, group)) %>%
        filter(term %in% c("vot1.sc", "context.numeric", "context.numeric:group.numeric")) %>%
        mutate(n = .n, singular = isSingular(m), convergence = m@optinfo$conv$opt)
      return(s)
    }, 
    .parallel = T,
    .paropts = list(.packages = c("tidyverse", "lme4", "broom.mixed", "simr"), .export = c("d.test"), .verbose = F),
    .progress = plyr::progress_text(char = ".")
  )
  
  return(d)
}

Hi all, I've been trying to adapt a function to my design, but I'm getting identical results for different sample sizes. Wonder if anyone could tell me what I am doing wrong in my code (see below). I don't know if it would have another way to set separate random seed for each call.

All the best, Victor

run_sim <- function(mod_sim, npart, nstim, nsim, alpha, teff, globals = NULL) {
  n_workers <- availableCores()
  
  pairs <- expand.grid(npart, nstim)
  
  results <- data.frame(N_part = numeric(),
                        N_stim = numeric(),
                        Result = numeric(),
                        Lower = numeric(),
                        Upper = numeric(),
                        stringsAsFactors = FALSE)
  
  for (i in 1:nrow(pairs)) {
    pair <- pairs[i, ]
    
    value1 <- pair[[1]]
    value2 <- pair[[2]]
    
    # Generate a random seed for each iteration
    seed <- sample(1:1000, 1)
    
    plan(multisession, workers = n_workers)
    
    # Use the correct random seed in powerSim function call
    pstests <- future_replicate(nsim, powerSim(mod_sim, nsim = 1, test = fixed(teff, "t"), seed = seed),
                                future.globals = globals,
                                progress = FALSE, simplify = FALSE)
    
    plan(sequential)
    
    pvals <- lapply(pstests, function(x) x$pval)
    successes <- sum(unlist(pvals) < alpha)
    trials <- length(pvals)
    mean_p <- successes / trials
    
    conf_interval <- binconf(successes, n = trials, method = "exact")[, c("Lower", "Upper")]
    ci_lower <- conf_interval["Lower"]
    ci_upper <- conf_interval["Upper"]
    
    new_row <- data.frame(N_part = value1,
                          N_stim = value2,
                          Result = mean_p,
                          Lower = ci_lower,
                          Upper = ci_upper,
                          stringsAsFactors = FALSE)
    
    results <- rbind(results, new_row)
  }
  
  return(results)
}

victorshiramizu avatar Jul 26 '23 15:07 victorshiramizu

Hi @victorshiramizu, I'll have a proper look soon-ish.

I haven't used futures before (I like the look of it though) but with that caveat my guess is that you're right and seed is not being iterated through, with powerSim repeatedly looking at seed[1].

If you don't use seed at all does it give you more sensible (albeit not reproducible) results?

Edit: that shouldn't explain different sample sizes giving the same power though ... did you intend to use pairs and extend to increase the sample size in model_sim?

pitakakariki avatar Jul 26 '23 20:07 pitakakariki

Hi Peter, thank you for your reply. I'm using pairs only to summarize the results. The model was already extended using the extend function. Whether the seed is used or not, the output remains relatively unchanged.

Here's what I'm getting: image

victorshiramizu avatar Jul 27 '23 09:07 victorshiramizu

Looks like there are three issues:

  1. I think you're calling run_sim with nsim=1 - possibly this was for testing?
  2. If you call run_sim with a larger nsim, you'll still get power of either 0% or 100% because each iteration is using the same seed. I think you need to set future.seed instead?
  3. As far as I can tell the same model object mod_sim is being used for every sample size. The code is looping through different values for N_part but I can't see where it's changing mod_sim to reflect this?

pitakakariki avatar Jul 27 '23 19:07 pitakakariki