NanoSim icon indicating copy to clipboard operation
NanoSim copied to clipboard

Output is not reproducible

Open baraaorabi opened this issue 2 years ago • 1 comments

In line 1505 of the simulator.py script: https://github.com/bcgsc/NanoSim/blob/23911b67ce4733f0468ac26296e25348e3b73b4b/src/simulator.py#L1504-L1506

The seed is reset before each process/thread is started. This overrides the setting of the random seed in here: https://github.com/bcgsc/NanoSim/blob/23911b67ce4733f0468ac26296e25348e3b73b4b/src/simulator.py#L2180-L2181

It also causes the the output of Nanosim to be not reproducible despite using the same value for the --seed arg. I think these two lines (L1505 and L1506) should be dropped. Or maybe they should be replaced with something like?

deterministically_random_seed = random.getrandbits(32)
np.random.seed(deterministically_random_seed) 
random.seed(deterministically_random_seed)

This would give each thread a new random seed that is deterministically defined by the --seed value which should ensure reproducibility across runs with the same --seed.

baraaorabi avatar Apr 14 '22 22:04 baraaorabi

Thanks Baraa! I agree with you. I have suspicions that this is an issue when I was reading through simulator.py recently. @SaberHQ and I will look into it!

kmnip avatar Apr 15 '22 08:04 kmnip