pathfindR
pathfindR copied to clipboard
Reproducible results
I have been playing around with this package and so far it works great. However, I noticed that the results are not reproducible - when I run run_pathfindR twice with the same input I get different results, with sometimes quite a bit of variation (e.g. 60 enriched pathways on one run and 80 on the next).
If I want to use the results in a publication I prefer my code to be reproducible. Usually in R this is quite easy to do with set.seed(), however with this package most computations are performed externally in java. I tried to make the java code reproducible by adding a seed value, e.g. for SA I changed line 81 of SimulatedAnnealing.java to
Random rand = new Random(42);
and in ScoreCalculations.java I tried to make the shuffling reproducible:
// Create reproducible source of randomness for shuffling
Random rand = new Random(42);
for (int trial = 0; trial < numberOfTrials; trial++) {
// long start=System.nanoTime();
Collections.shuffle(nodeListForSampling, rand);
Unfortunately, this did not make the results reproducible.
I then thought it might be due to the parallel processing, but using run_pathfindR with n_processes = 1 also did not help.
To be honest, I am not that familiar with java, so perhaps I missed something. Do you maybe have any suggestions to make the results reproducible?
Thank you for raising this issue. We've been planning on enabling setting a seed for some time, and will be implementing this in run_pathfindR() as an argument soon. I'll keep you updated and let you know once the change is made
We made the necessary changes (as of commit above) to ensure results are reproducible (by setting a seed per each iteration, which is now the default behavior of run_pathfindR)