rlberry Seeding in the AgentManager and additive fits

Seeding in the AgentManager and additive fits

Open mmcenta opened this issue 3 years ago • 2 comments

Hello,

I've been trying to make the StableBaselinesAgent (PR #148) be compatible with additive fits but I ran into some issues:

In the _fit_worker auxiliary function, we reseed external libraries. I believe that this is done to guarantee reproducibility when doing distributed training. However, when doing two fits .fit(X), the result will not be the same as doing a single .fit(2X) because the seed will be reset halfway throughout training. Here is the code.
In the load method of AgentHandlers, we reseed the environment after loading the agent, which causes similar issues. I also noticed that the handler's seed is used to reseed the environment, which is different from the seed that was originally used. Here is the code.

I would love to know your opinions on the matter!

Mar 22 '22 16:03 mmcenta

Maybe we could modify rlberry Seeder in order to accept a pytorch generator as a seed_seq. I looked into torch's rng and really they don't seem compatible with anything but themselves (they can't import a numpy rng for instance) so I don't think it is easy to reseed torch generator in the manager, it would be better to import torch generator as an rlberry Seeder.

Mar 23 '22 15:03 TimotheeMathieu

Regarding @mmcenta 's point:

.fit(X), the result will not be the same as doing a single .fit(2X) because the seed will be reset halfway throughout training.

I think it's ok to have this behavior in AgentManager only, as long as the whole pipeline (parameters -> manager -> outputs) is reproducible. I believe it's important to enforce the additive property of fit() only at the Agent level, to make sure that the optimization done by AgentManager.optimize_hyperparams makes sense when fit_fraction < 1 (that is, when fit() is called several times to evaluate hyperparameters).

Apr 19 '22 19:04 omardrwch

rlberry rlberry copied to clipboard

Seeding in the AgentManager and additive fits

rlberry
rlberry copied to clipboard