garage
garage copied to clipboard
Standardize off-policy RL hyperparameters across the codebase
e.g. parameters such as steps_per_epoch, epoch_cycles, etc, and standardize across all agorithms in the codebase
A use mode that we have in garage is the ability to control how frequently the training epochs of an algorithm are logged.
Because we haven't standardized how this is done across all of our algorithms, there is a potential foot-guns/confusion for new users.
I propose that runner.train
's api be modified so that calls to runner.train
look something like this:
runner.train(num_epochs=100, logging_frequency=5, ...)
we would use num_epochs
and logging_frequency
to compute evaluation_epochs=int(ceil(num_epochs/logging)
@krzentner what's your opinion on this
we would need to make the parameter evaluation_epochs
a public attribute of runner.
I think the downsides of having an interface like this is that downstream inside your_algorithm.train()
, new algorithm implementors would have to have the awareness to use evaluation_epochs
.
It would probably look something like this:
algorithm:
def train(runner, itr):
for epoch in runner.evaluation_epochs:
for _ in range(logging_frequency):
samples = obtain_samples()
optimize(samples)
evaluation_samples = _obtain_eval_samples()
log(evaluation_samples)