async-rl
async-rl copied to clipboard
Stop actor gradient flowing through the critic
I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.
Oh interesting! I will definitely take a look at this. Thank you.
I noticed this as well and believe it's a significant cause of performance degradation. Additionally, you don't seem to be adding the entropy term to the objective which they mention in the paper as being useful for improving exploration.