future
future copied to clipboard
Issue with Sequential and Multicore futures finding variables that they shouldn't
(This is an issue I discovered when running my furrr tests interactively. It doesn't occur on CI, or on CRAN, or when I do a full check()
locally, and I think that has to do with the environments involved)
In this example, fn()
should not be able to be evaluated, because its function environment does not contain y
.
However, if you put it in a Sequential or Multicore future and define y
in the environment where the future was created, then it can be evaluated and "works" even though it shouldn't. Notably, y
doesn't even show up in the list of globals for the future.
library(future)
fn <- function(x) {
y
}
# Should not have access to `y`!
fn(1)
#> Error in fn(1): object 'y' not found
wrapper <- function(fn) {
y <- -1
future(fn(1))
}
plan(sequential) # or plan(multicore, workers = 2)
fut <- wrapper(fn)
# No `y` in the globals!
fut$globals
#> $fn
#> function(x) {
#> y
#> }
#>
#> attr(,"where")
#> attr(,"where")$fn
#> <environment: R_EmptyEnv>
#>
#> attr(,"class")
#> [1] "FutureGlobals" "Globals" "list"
#> attr(,"resolved")
#> [1] FALSE
#> attr(,"total_size")
#> [1] 4384
#> attr(,"already-done")
#> [1] TRUE
# But somehow it still worked! This shouldn't happen.
value(fut)
#> [1] -1
This doesn't happen for Multisession futures:
library(future)
fn <- function(x) {
y
}
wrapper <- function(fn) {
y <- -1
future(fn(1))
}
plan(multisession, workers = 2)
fut <- wrapper(fn)
value(fut)
#> Error in fn(1): object 'y' not found
I noticed that in getGlobalsAndPackages()
, the future.globals.keepWhere
option controls whether or not the where
environments are removed by setting them to emptyenv()
. Removing them seems reasonable, and was added in https://github.com/HenrikBengtsson/future/issues/475.
However, this setting of the where
environment to emptyenv()
conflicts with assign_globals()
used in running sequential and multicore futures. Here is where that is used for sequential futures:
https://github.com/HenrikBengtsson/future/blob/82d4817214900af156dee63b567d00e8229000bb/R/UniprocessFuture-class.R#L65
assign_globals()
will reset the environment of the global to a child of the future's environment IF it was the emptyenv()
. Because of the future.globals.keepWhere
branch, the global's environment is identical with emptyenv()
, so this runs. So that sets the environment of the fn
global to a child environment of the future itself, i.e. it basically uses the envir
of future(envir = parent.frame())
, and this has y
in it.
https://github.com/HenrikBengtsson/future/blob/230e428c8a8c18e12b01f3050a90c3adab579024/R/utils.R#L179-L182
So it looks like that FIXME note about being overly conservative may actually be introducing a bug.
We can confirm all this by setting future.globals.keepWhere
to FALSE
to try not setting the where
environments to emptyenv()
(which avoids resetting them to the future's env):
library(future)
fn <- function(x) {
y
}
wrapper <- function(fn) {
y <- -1
future(fn(1))
}
plan(sequential)
# If we set this option to prevent `where` from being removed, it errors as expected
opt <- options(future.globals.keepWhere = TRUE)
fut <- wrapper(fn)
value(fut)
#> Error in fn(1): object 'y' not found
Great to see this issue has already been reported in such detail. Just today I failed to reproduce results from a simulation study run with future.apply 1.8.1 + future 1.21.0 for the same reason (I guess). It breaks when going from future 1.21.0 to future >= 1.22.1. I'll leave my (hopefully minimal, even non-RNG) example here; maybe it helps in addition to the one provided in the original report:
r0 <- local({
mu <- 0
function () mu
})
r1 <- local({
mu <- 1
function () mu
})
replicate(3, r1()-r0())
#> [1] 1 1 1
library("future.apply")
plan(sequential) # or multicore
future_replicate(3, r1()-r0())
#> [1] 0 0 0
## "Solution" 1:
future_replicate(3, r1()-r0(), future.globals = "idontexist")
#> [1] 1 1 1
## "Solution" 2:
options(future.globals.keepWhere = TRUE)
future_replicate(3, r1()-r0())
#> [1] 1 1 1
Thanks both. This looks quite serious, especially since it can be a silent bug that produces incorrect results. I suspect it's due to the following future 1.22.0 update (https://github.com/HenrikBengtsson/future/issues/515):
- future(fcn(), globals=list(a=42, fcn=function() a)) would fail with "Error in fcn() : object 'a' not found" when using sequential or multicore futures. This affected also map-reduce calls such as future.apply::future_lapply(1, function(x) a, future.globals=list(a=42)).
which, if my memory is correct, was quite a challenging issue to solve. I guess I should have been alerted by those unusually tricky challenges.
future_replicate(3, r1()-r0(), future.globals = "idontexist")
FWIW, here you're effectively skipping the identification of globals, meaning an alternative would have been to use future.globals = FALSE
.
@bastistician , a version of your example, without future.apply, is:
library(future)
plan(sequential)
r0 <- local({
mu <- 0
function() mu
})
r1 <- local({
mu <- 1
function() mu
})
truth <- r1() - r0()
print(truth)
#> [1] 1
f <- future(r1() - r0())
y <- value(f)
print(y)
#> [1] 0
stopifnot(identical(y, truth))
#> Error: identical(y, truth) is not TRUE
Update: I've just released future 1.26.1, to fix some urgent bugs, but it does not fix this problem. I've been doing quite some work the last four weeks, and I was hoping to fix the long outstanding problem once and for all. It's complicated because when serializing a function/closure we want to make sure all:
- all the environments are serialized
- we want to avoid bringing large "cargo" objects along when serializing functions
- R does not serialize the global environment; it's just replaced with whatever the global environment is on the receiving end. This means functions with globals in the global environment, does not behave the same as functions in local environments
These properties and objectives compete with each other in the sense that, solving one, tends to break another one. At least, if one tries to do a "quick" fix (e.g. keepWhere = TRUE/FALSE
).
So, to solve this once and for all, I need to be able to deconstruct functions and their environments (possibly several layers) and then rebuild them again so that the global environment is not part of the equation and ideally with "cargo" objects pruned away. On top of this, I need to do it so that functions that share the same environment (e.g. r0()
and r1()
above), share the same environments after reconstruction. I've basically started an environments package just to study different kind of ways these problems may appear in futures and trying to figure out ways to automate the "pruning" process. This will take quite some more understanding on my end and more work, so it'll take a while before it'll be fixed here in future. Another benefit of this side project is that it'll help me understand how to best handle caching of globals on the workers; a feature that's on the roadmap (and a blocker to do some more cleanups).