trpo Normalize advantage function

Normalize advantage function

Open rarilurelo opened this issue 8 years ago • 2 comments

Hi, thanks for your implementation of TRPO.

In https://github.com/wojzaremba/trpo/blob/master/main.py#L128-L132 you normalize an advantage function. I couldn't find any description about this operation in the paper( https://arxiv.org/abs/1502.05477 ). Why did you do that?

Jan 05 '17 05:01 rarilurelo

I have found it in John Schulman's code. This normalization is biased, but it's sensible.

Jan 08 '17 19:01 wojzaremba

Thanks! I found it here(https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L59-L62).

Jan 09 '17 08:01 rarilurelo

trpo trpo copied to clipboard

Normalize advantage function

trpo
trpo copied to clipboard