garage icon indicating copy to clipboard operation
garage copied to clipboard

Standardize off-policy RL hyperparameters across the codebase

Open avnishn opened this issue 4 years ago • 1 comments

e.g. parameters such as steps_per_epoch, epoch_cycles, etc, and standardize across all agorithms in the codebase

A use mode that we have in garage is the ability to control how frequently the training epochs of an algorithm are logged.

Because we haven't standardized how this is done across all of our algorithms, there is a potential foot-guns/confusion for new users.

I propose that runner.train's api be modified so that calls to runner.train look something like this:

runner.train(num_epochs=100, logging_frequency=5, ...)

we would use num_epochs and logging_frequency to compute evaluation_epochs=int(ceil(num_epochs/logging)

avnishn avatar Jun 29 '20 20:06 avnishn

@krzentner what's your opinion on this

we would need to make the parameter evaluation_epochs a public attribute of runner. I think the downsides of having an interface like this is that downstream inside your_algorithm.train(), new algorithm implementors would have to have the awareness to use evaluation_epochs.

It would probably look something like this:

algorithm:

def train(runner, itr):
    for epoch in runner.evaluation_epochs:
        for _ in range(logging_frequency):
            samples = obtain_samples()
            optimize(samples)
    evaluation_samples = _obtain_eval_samples()
    log(evaluation_samples)

avnishn avatar Jun 29 '20 20:06 avnishn