higgsfield Error in 1.actor-critic.ipynb

Error in 1.actor-critic.ipynb

Open denisergashbaev opened this issue 5 years ago • 1 comments

Through debugging I found an interesting bug in your code. let's set the number of environments to 1 and the number of steps to 32. It can happen that the environment terminates one or more times within the 32 steps. In that case, the function compute_returns() will return a wrong return as through the masks the initial estimate of next_value will be lost...

Jan 29 '20 21:01 denisergashbaev

When you use SubprocVecEnv as a wrapper to have multiple actors in parallel, the environments automatically call the reset() method after the end of an episode.

I think the compute_returns() is correct because it goes backward:

for step in reversed(range(len(rewards))):
   R = rewards[step] + gamma * R * masks[step]

if for a given step x in [0, 32 - 1[, the environment ends, at this step, R value will be equal to rewards[step] which is good because it means that we have done a roll-out and we don't need the initial estimate of next_value anymore.

Jul 08 '20 09:07 ingambe

higgsfield higgsfield copied to clipboard

Error in 1.actor-critic.ipynb

higgsfield
higgsfield copied to clipboard