grf
grf copied to clipboard
Reproducibility and random seed
In order for a grf forest to produce the same results on different machines, in addition to the sameseed argument, it also needs the same num.threads argument, as pointed out in the online reference
Therefore, in order to ensure consistent results, we provide the following recommendations.
Make sure arguments seed and num.threads are the same across platforms Round data to 8 significant digits)
Since this sometimes comes up (#1262) it would be good to also state this in the R documentation for the seed argument.
(making results independent of num.threads would be ideal and straight-forward to do - #1263, but unfortunately, it would be a breaking change, changing results for people who expect the old behavior, and so for now it's reasonable to just keep it this way).
UPDATE:
We've finally decided to change the default GRF behavior to not have the random seed interact with the number of threads used to train the forest (#1447). This will hopefully make journal submissions which now have strict reproducibility requirements easier for researchers.
We will treat this as bug fix in the next grf minor release 2.4.0 where we also add a backwards compatibility option for researchers who wish to reproduce past results exactly without reinstalling a previous version of grf. In order to do that, you'd add options(grf.legacy.seed=TRUE) to the top of your R script.
Thank you very much to @sebp for nudging us towards finally fixing this.