BiocParallel icon indicating copy to clipboard operation
BiocParallel copied to clipboard

Add bpexport functionality

Open mtmorgan opened this issue 11 years ago • 9 comments

bpexport to make local variables available to remote computation. From the mailing list

mtmorgan avatar Nov 06 '13 14:11 mtmorgan

I'm taking a stab at this here: https://github.com/DarwinAwardWinner/BiocParallel/tree/bpexport

So far I've added stubs for all the params, and I've added a clusterExport-based implementation for SnowParam. But thinking about it, that will only work if the cluster is running when clusterExport is called, so even that is not fully implemented.

DarwinAwardWinner avatar Nov 08 '13 00:11 DarwinAwardWinner

One issue to consider is, what if we call bpexport on a SerialParam or a MulticoreParam? They already have access to all the parent's variables, including any changes to those variables' values that occur after the call to bpexport. Should we make an attempt to have these params match the behavior of e.g. SnowParam by storing a snapshot of the variables when bpexport is called and then using that snapshot in place of the current value when the param is used?

Also, what should happen when you call bpexport on a stopped cluster? What should happen when you stop a cluster after exporting a variable?

DarwinAwardWinner avatar Nov 08 '13 00:11 DarwinAwardWinner

I'd like to suggest creating a simple class/list storing objects exported via bpexport. As soon as bplapply/bpmapply is called the objects can then be put into the function's environment. Something like

exported = list(x = 12, y = rnorm(10))
mapply(assign, x = names(exported), values = exported, MoreArgs=list(envir = environment(FUN))

You would just have to check that environmentName(FUN) != "R_GlobalEnv" and in this case just give the function a new environment with the GlobalEnv as parent.

mllg avatar Nov 08 '13 09:11 mllg

I think it's probably a good idea to always give the function a new environment with the exported values and with the function's previous environment as parent. Are you suggesting this for the SerialParam and MulticoreParam classes?

DarwinAwardWinner avatar Nov 08 '13 09:11 DarwinAwardWinner

Yes, Serial and Multicore. I also see no drawbacks for BatchJobs over its internal export mechanism. I don't know if this is applicable for DoPar. You could pass them to .export in foreach, but I was unable to find a way to turn the heuristic auto-export off.

On more thing to consider is the expected behavior if a variable is explicitly exported and also defined in the function's environment. Variables in the function's env have precedence in the lookup which deviates from the lookup using parallel/clusterExport (which assigns to GlobalEnv on the slaves).

mllg avatar Nov 08 '13 10:11 mllg

Well, I think the goal would be in all cases to keep the behavior consistent across all param classes. So to answer what happens when you export a variable and the same variable is defined in the function's environment, we ask what happens naturally in the case of ShowParam where you use clusterExport to implement bpexport, and then make sure we do the same thing for the other params, right? I actually don't know how (or if) function environments get transferred between processes by snow and others.

DarwinAwardWinner avatar Nov 08 '13 18:11 DarwinAwardWinner

Actually, to be honest, I'm probably not the best person to implement this, because the vast majority of the time I want to do parallel stuff in R, I use multicore, so I never have to worry about exporting variables and I have no real idea how to do it.

DarwinAwardWinner avatar Nov 08 '13 18:11 DarwinAwardWinner

Thinking about it, we should probably take this same "just-in-time export" approach for SnowParam as well. This will solve the problem of the cluster not being running when bpexport is called.

DarwinAwardWinner avatar Nov 09 '13 08:11 DarwinAwardWinner

Ok, I am finding myself using BatchJobsParam a lot and wanting export functionality, so I will try to work on this some time soon.

DarwinAwardWinner avatar Apr 29 '15 22:04 DarwinAwardWinner