sacred
sacred copied to clipboard
Randomness and MongoObserver
After a bit of deep debugging, I realized that the MongoObserver makes a call to the random.choice
function (here) during each heartbeat, making it basically impossible to guarantee reproducibility if the random
module is used directly without making use of a specific random.Random
instance.
I think that there is not much to do about it on Sacred side, but a note in the docs making clear that random
should be used carefully when coupled with the MongoObserver could help other users observing this behaviour.
Or can you see a way of solving this problem that would be transparent to the user ?
Thanks for reporting! I would at least think that you should get around this by explicitly injecting individually seeded random number generators into your captured/main functions as described in https://sacred.readthedocs.io/en/latest/randomness.html#special-arguments. Could you try this out and check if this makes your experiments reproducible?
Agree with @JarnoRFB. The problem of libraries or other code using the global sources of randomness is the main reasons the _rnd
parameter exists.
It is good to know that pymongo is one of these gotcha libraries, and it might be worth pointing this out in the documentation.
Thanks for your answer!
Yes, using a specific Random
or RandomState
instance like the one provided by the _rnd
parameter solves the problem. I agree that this is not a big issue since sacred provides a way of properly seeding experiment, but having a specific note about this behaviour in the docs could help the debugging for users using the global random state.
Hello,
I think I have got a relative problem. When I was training my PyTorch model, I realized that the training progress (loss per epoch) with and without MongoObserver is different, although I have set the same global seed. If I train many time WITHOUT MongoObserver, I got the same result. If I train many time WITH MongoObserver, I also got the same result. However, the result of with and without MongoObserver is different
Are you using a globally seeded random number generator? If so, your can try to replace it with the _rnd parameter in your captured function as @JarnoRFB suggested.
Since the MongoObserver is directly sampling from the random
module, it alters the global state and can cause the behavior you are describing.
Hello @TomVeniat , Thank for your response. I have set the global seed by command line "python script.py with seed=global_seed_number". So, I don't really know how can I make use of the _rnd parameter?
For the mentioned problem, If I replace the random.choice in pymongo library by selecting only the first item in the list, then I am able to get the same result with and without MongoObserver. This is a simple hack I am using for now. But modifying vendor library is not a good way, I think.
Your seeding is correct. The _rnd
parameter should be used in your code each time you want to use a random operation.
For example, replacing all calls to random.randint
with _rnd.randint
. This way the global state will have no impact on your code and you won't have to modify PyMongo's code.
Hey @TomVeniat I see what you mean. However, the calling to random module is inside the PyMongo's code, and to be able to use MongoObserver, I have to include the PyMongo's library. There is no way I can replace those calls to random module from my side except modifying the library. If you know another way to workaround, please let me know.
Are you using the random
module in your code? If so you should use _rnd
on these calls instead of directly using random.xxx
.
Once its done, PyMongo can do its random.choice
thing without impacting your random state.
Ah, I see what you mean. I will have to change all of my calls to random module to avoid the impact from random.choice from PyMongo. Thanks a lot