nimble icon indicating copy to clipboard operation
nimble copied to clipboard

Parallelization when using a user-defined distribution

Open dill opened this issue 4 years ago • 10 comments

Having a great time getting started with nimble, thanks to some excellent documentation! Thank you.

But... I'm having a wee bit of trouble using a distribution I've defined while also attempting to run nimble in parallel. It seems like I'm getting the environment misspecified but I am not sure how. I've pasted a "minimal" example below based on the parallelization guide but using the dmyexp as defined in the user manual.

Things I have tried:

  1. defining dmyexp inside run_MCMC_allcode (shown below), using <- or <<- as assignment.
    • when <<- is used to assign the function, it seems like registerDistributions doesn't work as the error is about a missing rmyexp.
    • when <- is used, dmyexp isn't found (similar to this issue?).
  2. defining dmyexp outside run_MCMC_allcode, using clusterExport(c("dmyexp"), cl=this_cluster) to "export" the function into the cluster environment.
    • similar to above, error that we can't find rmyexp.

I could define an r* function, but it's a bit fiddly so I was hoping I could avoid it if I can.

With apologies if I've missed something obvious here. Thanks for any advice!

library(parallel)

this_cluster <- makeCluster(4)

set.seed(10120)
# Simulate some data
myData <- rexp(1000, rate = 0.8)

# Create a function with all the needed code
run_MCMC_allcode <- function(seed, data) {
  library(nimble)

  # from the manual
  dmyexp <<- nimbleFunction(
    run = function(x = double(0), rate = double(0, default = 1),
                   log = integer(0, default = 0)) {
      returnType(double(0))
      logProb <- log(rate) - x*rate
      if(log) return(logProb)
      else return(exp(logProb))
    })

  registerDistributions(list( dmyexp = list(
    BUGSdist = "dmyexp(rate)",
    range = c(0, Inf),
    pqAvail =FALSE)
  ), userEnv=parent.frame(1))

  myCode <- nimbleCode({
    b ~ dnorm(0, 100)

    for (i in 1:length_y) {
      y[i] ~ dmyexp(rate = b)
    }
  })

  myModel <- nimbleModel(code = myCode,
                          data = list(y = data),
                          constants = list(length_y = 1000),
                          inits = list(a = 0.5, b = 0.5))

  CmyModel <- compileNimble(myModel)

  myMCMC <- buildMCMC(CmyModel)
  CmyMCMC <- compileNimble(myMCMC)

  results <- runMCMC(CmyMCMC, niter = 10000, setSeed = seed)

  return(results)
}

chain_output <- parLapply(cl = this_cluster, X = 1:4, 
                          fun = run_MCMC_allcode, 
                          data = myData)

stopCluster(this_cluster)

System info:

dill@nibbler3 sim (master *)$ uname -a
Darwin nibbler3.fritz.box 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
dill@nibbler3 sim (master *)$ R --version
R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

dill@nibbler3 sim (master *)$ Rscript -e "packageVersion('nimble')"
[1] '0.10.0'

(Installed nimble from github.)

dill avatar Jul 09 '20 20:07 dill

@dill Thanks for this report.

It looks like you've run into some limitations in how we handle scoping that is causing the user-defined functions not to be found in various ways. I'm going to iterate with some of our other developers later in this issue thread, but I wanted to first directly give you a work-around.

Basically I think you do for now need to defined the r function. It's not hard -- the body of the function can just be dummy code:

returnType(double())
return(0)

The args should have n = integer(0) as the first args and then all the parameters that are args to your d function, and of course omit the log argument.

You can either define this in your master process and export it or you can define it in the same place as you define your d function in the function. And if you define in the function, use <<- for both d and r functions.

I haven't tested this, so if things don't work, please follow up.

paciorek avatar Jul 10 '20 19:07 paciorek

Thanks for the quick solution @paciorek, this worked in the example above. Just for those following up, I added the following code chunk to the run_MCMC_allcode above:

  rmyexp <<- nimbleFunction(
    run = function(n = integer(0), rate = double(0, default = 1)) {
      returnType(double())
      return(0)
  })

I'll try applying this in my more complicated example and report back.

dill avatar Jul 10 '20 19:07 dill

@perrydv @danielturek It looks like we have some holes in how we handle scoping when defining user-defined distributions (and presumably functions and samplers) inside functions.

I started to go through to try to set the environment where user-defined stuff is searched for to be the environment where a model is being defined. I did this in prepareDistributionInput (which is called by registerDistributions). This is in branch fix-userdist-scope (a bad name since I think this is a more general issue than for distributions only).

However, the next scoping error arose in getSymbolicParentNodesRecurse. I am happy to keep follow things through our processing, but in checkNimbleOrRfunctionNames I see a comment from Perry from 2017 that gave me pause:

## Would like to do this by R's scoping rules here and in genCpp_sizeProcessing but that is problematic

Hopefully we can do something because I believe that right now, one can only use user-defined dists/funs/samplers if they are defined in the GlobalEnv. And I'm happy to pursue this but wanted to see what you both remember about our discussion of scoping in the past. I have a vague memory of this coming up but I don't see any relevant issues in nimble-dev or NCT.

Depending on where we go with solving this, we might add a note to our parallel example that says to always put user-defined dists/funs/samplers in the GlobalEnv.

paciorek avatar Jul 10 '20 20:07 paciorek

@paciorek Can we look at this on devel because there are some unreleased changes there in how things are found.

perrydv avatar Jul 10 '20 21:07 perrydv

@perrydv Not sure what you mean by "look at on devel". I branched fix-userdist-scope off devel just this morning.

paciorek avatar Jul 10 '20 21:07 paciorek

That's all I wanted to check. If it's in GlobalEnv, will it work in parallelization?

perrydv avatar Jul 10 '20 22:07 perrydv

Currently, if user-defined 'stuff' is in GlobalEnv on the workers, I believe it will work. The snag is that when we autogenerate the 'r' function, it is not put in GlobalEnv if the nimbleModel call is inside a function, which is what happens when users use things like parLapply.

Another issue is that it's weird to tell users to put stuff in GlobalEnv while also telling them all nimbleModel, buildMCMC, compileNimble calls that are done in parallel should be done inside the function being parallelized.

paciorek avatar Jul 10 '20 22:07 paciorek

@paciorek Maybe I'm missing something here, and this might sound like a work-around rather than fixing the root problem. But when we auto-generate the r function, could we create it the manually force it to be defined in the GlobalEnv?

danielturek avatar Jul 11 '20 12:07 danielturek

Yes, that would address the smaller piece of the puzzle, but there is still the general issue that a user defining a user-defined nimbleFunction within a function will see an error and that telling them to define things in the Global is the reverse of what we say in general about how to do parallelization, where we tell them to do everything within the function being parallelized.

paciorek avatar Jul 11 '20 16:07 paciorek

I need to dig a bit deeper into the source, but at a glance it seems like it should be feasible to pass an env argument to the compilation step. Is there an architectural reason users can't just specify where they want the custom stuff to be found?

grantbrown avatar Sep 29 '23 19:09 grantbrown