async-rl icon indicating copy to clipboard operation
async-rl copied to clipboard

Stop actor gradient flowing through the critic

Open danijar opened this issue 9 years ago • 2 comments

I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.

danijar avatar Jul 31 '16 10:07 danijar

Oh interesting! I will definitely take a look at this. Thank you.

coreylynch avatar Aug 01 '16 18:08 coreylynch

I noticed this as well and believe it's a significant cause of performance degradation. Additionally, you don't seem to be adding the entropy term to the objective which they mention in the paper as being useful for improving exploration.

steveKapturowski avatar Sep 10 '16 02:09 steveKapturowski