rlberry
rlberry copied to clipboard
Seeding in the AgentManager and additive fits
Hello,
I've been trying to make the StableBaselinesAgent (PR #148) be compatible with additive fits but I ran into some issues:
- In the
_fit_workerauxiliary function, we reseed external libraries. I believe that this is done to guarantee reproducibility when doing distributed training. However, when doing two fits.fit(X), the result will not be the same as doing a single.fit(2X)because the seed will be reset halfway throughout training. Here is the code. - In the
loadmethod of AgentHandlers, we reseed the environment after loading the agent, which causes similar issues. I also noticed that the handler's seed is used to reseed the environment, which is different from the seed that was originally used. Here is the code.
I would love to know your opinions on the matter!
Maybe we could modify rlberry Seeder in order to accept a pytorch generator as a seed_seq.
I looked into torch's rng and really they don't seem compatible with anything but themselves (they can't import a numpy rng for instance) so I don't think it is easy to reseed torch generator in the manager, it would be better to import torch generator as an rlberry Seeder.
Regarding @mmcenta 's point:
.fit(X), the result will not be the same as doing a single .fit(2X) because the seed will be reset halfway throughout training.
I think it's ok to have this behavior in AgentManager only, as long as the whole pipeline (parameters -> manager -> outputs) is reproducible. I believe it's important to enforce the additive property of fit() only at the Agent level, to make sure that the optimization done by AgentManager.optimize_hyperparams makes sense when fit_fraction < 1 (that is, when fit() is called several times to evaluate hyperparameters).