phasic-policy-gradient
phasic-policy-gradient copied to clipboard
Code for the paper "Phasic Policy Gradient"
What is RCALL_LOGDIR supposed to be? I literally cannot find any documentation stating what it's related to anywhere. It's only used in one place, and to print to the terminal,...
This commit adds periodic logging of evaluation scores for the policy being trained. It also adds `num_levels` and `start_level` to the arguments. Based on code from @rraileanu
According to the code (https://github.com/openai/phasic-policy-gradient/blob/master/phasic_policy_gradient/train.py#L14), arch 'detach' seems corresponding to the single-network variant described in section 3.6 of the paper. According the paper and the comment in the code, the...
Hi! I wonder whether the code means it will compute aux loss and its gradient for arch == "detach" and "dual". (If i missed something important I'm sorry)
I was trying out this code with custom gym environment (a non gaming, timeseries environment tested to work with baselines) converted to gym3 environment using [`gym3.interop.FromGymEnv()`](https://github.com/openai/gym3/blob/4c3824680eaf9dd04dce224ee3d4856429878226/gym3/interop.py#L118) and ended up getting...