mice icon indicating copy to clipboard operation
mice copied to clipboard

Preserve the stochastic nature of mice results (#426)

Open GBA19 opened this issue 2 years ago • 8 comments

The change requested in #426 and implemented in #432 has the side-effect of having mice return the exactly same set of imputed values on each instantiation of the function.

The following code example illustrates this (demonstrated with the mean for simplicity):

replicate(5, {imp <- mice(nhanes, m= 1, printFlag = FALSE); complete(imp)$bmi |> mean()} ) # [1] 27.144 27.144 27.144 27.144 27.144

Given the stochastic nature of multiple imputation this seems unreasonable. This may break some existing scripts---it certainly broke one of mine.

There is a simple workaround, adding a runif(1), but it is preferable to have mice work reasonably 'out of the box':

replicate(5, {runif(1); imp <- mice(nhanes, m= 1, printFlag = FALSE, seed= 12345); complete(imp)$bmi |> mean()} ) # [1] 27.032 26.984 26.900 26.900 26.900

If indeed someone needs exact replications on different instantiations, the present behavior in 3.14.0, the solution is as simple as the following:

replicate(5, {runif(1); imp <- mice(nhanes, m= 1, printFlag = FALSE, seed= 12345); complete(imp)$bmi |> mean()} ) # [1] 26.408 26.408 26.408 26.408 26.408

Hopefully, this is something you are able to solve.

"R version 4.1.2 (2021-11-01)" Package Version "mice" "3.14.0" "withr" "2.4.3"

GBA19 avatar Jan 03 '22 15:01 GBA19

Thanks for noting. I wasn't aware of the repetitive nature of the imputes in this case.

If I understand correctly, the problem occurs when no seed argument is given to mice(...) . In that case, withr::local_preserve_seed() initialises from the parent environment. If nothing has changed there, then we will get the same seed over and over again.

Perhaps a simple fix would be to add kick <- runif(1L) in the mice() function, just to kick the random generator, e.g.,

  # set local seed, reset random state generator after function aborts
  if (is.na(seed)) {
    kick <- runif(1L)
    withr::local_preserve_seed()
  } else {
    withr::local_seed(seed)
  }

Would that solve your case? Could be any unfortunate side effects?

stefvanbuuren avatar Jan 04 '22 11:01 stefvanbuuren

An added note. My simulations usually set the seed explicitly, e.g.

for (i in 1:5) {
  imp <- mice(nhanes, m = 1, printFlag = FALSE, seed = i)
  cat(complete(imp)$bmi |> mean(), "\n")
}

This is perhaps a bit safer.

stefvanbuuren avatar Jan 04 '22 11:01 stefvanbuuren

I was looking into this just now and thougth that the following might be the solution. Forgive me if I am totally wrong since I do not fully understand the withr::local_preserve_seed(). Nevertheless, I am just returning from the man-page.

if (is.na(seed)) {
    withr::local_preserve_seed() # restores .Random.seed on exiting scope
    set.seed(NULL) # reinitialize .Random.seed
  } else {
    withr::local_seed(seed)
  }

If this is totally wrong, so be it; if not, this appears to be more elegant.

GBA19 avatar Jan 04 '22 11:01 GBA19

Yes thanks, that is better as that will restore .Random.seed after returning from mice().

I just pushed an update.

Thanks a lot for your contribution.

stefvanbuuren avatar Jan 04 '22 11:01 stefvanbuuren

Pretty late to the party, but this new implementation also results in unexpected, and potentially unwanted, behavior. Namely, setting a seed in the global environment is reversed within the mice call, such that the results are always unique if no seed is specified within the mice() call. Setting a seed in the global environment thus no longer ensures reproducibility of the results, which is something that I think would be preferable over the current solution.

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
library(magrittr)

set.seed(123)

imp1 <- mice(boys, print = F)

imp1 %$%
  lm(hgt ~ bmi + tv) %>%
  pool() %>%
  summary()
#>          term  estimate std.error statistic        df      p.value
#> 1 (Intercept) 45.750588 7.8251969  5.846573 173.68503 2.431348e-08
#> 2         bmi  2.990458 0.4926343  6.070341  92.33349 2.795360e-08
#> 3          tv  3.738712 0.2099027 17.811641  26.77784 2.220446e-16


set.seed(123)

imp2 <- mice(boys, print = F)

imp2 %$%
  lm(hgt ~ bmi + tv) %>%
  pool() %>%
  summary()
#>          term  estimate  std.error statistic       df      p.value
#> 1 (Intercept) 42.567502 10.7210055  3.970477 12.82719 1.638943e-03
#> 2         bmi  3.165119  0.6533964  4.844103 12.59753 3.503675e-04
#> 3          tv  3.746043  0.2230530 16.794411 19.05428 7.056578e-13

Created on 2022-07-08 by the reprex package (v2.0.1)

I am actually unsure whether the local_preserve_seed() argument is something you would want. What is wrong with the fact that a call to mice adjust the state of the random number generator, as it actually uses this randomness?

thomvolker avatar Jul 08 '22 11:07 thomvolker

Ah... what a can of worms this is...

We have in mice.R:

  # set local seed, reset random state generator after function aborts
  if (is.na(seed)) {
    # restores .Random.seed on exiting scope
    withr::local_preserve_seed()
    # reinitialize .Random.seed
    set.seed(NULL)
  } else {
    # take specified seed to set local seed
    withr::local_seed(seed)
  }

We probably need some alternative to set.seed(NULL)

stefvanbuuren avatar Jul 08 '22 13:07 stefvanbuuren

The function umap::umap has preserve.seed as a parameter, also discussed in a vignette.

I see three usage scenarios, always assuming the default seed = NA on the mice function call:

  1. Repeatedly calling mice without changing the seed in a parent or ancestor environment: We would expect a randomly different result on each call.
  2. Setting the seed in a parent or ancestor environment: We would expect the sequence of seeds in both mice and a parent or ancestor environment to replicate exactly.
  3. Preserve the global seed despite changes within mice as requested in #426: I do not fully grasp the importance of this scenario.

As I understand the present state mice fulfills items 1 and 3 but fails on 2.

Maybe a preserve.seed= FALSE as a default parameter is a way to fulfill 1 and 2 and setting preserve.seed= TRUE could fulfill item 3. This would entail restoring the previous behavior and make item 3 a special case.

Please note: I do not feel strongly about this at all, just naively trying to be of some use.

GBA19 avatar Jul 08 '22 15:07 GBA19

I am no expert on random seeds, but my intuition says that these three aspects are incompatible, because if after every mice() you return to the stage of the random number generator before this call, any attempt to reproducibility (specified in the global environment) will likely lead to identical results over different calls.

Similarly to @GBA19, I do not really understand why you would like to preserve the global seed when you called mice. To me, this seems a bit like asking rnorm() to not affect the state of the random number generator. Still, there is apparently some interest in it, and a preserve.seed parameter might provide a solution.

thomvolker avatar Jul 10 '22 09:07 thomvolker

Since the local seed gave various unforeseen problems, mice 3.14.12 reverted seed setting behaviour to mice 3.13.10. Local seeds are gone. Trust this solves the issue.

stefvanbuuren avatar Nov 14 '22 15:11 stefvanbuuren