async-rl Stop actor gradient flowing through the critic

Stop actor gradient flowing through the critic

Open danijar opened this issue 9 years ago • 2 comments

I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.

Jul 31 '16 10:07 danijar

Oh interesting! I will definitely take a look at this. Thank you.

Aug 01 '16 18:08 coreylynch

I noticed this as well and believe it's a significant cause of performance degradation. Additionally, you don't seem to be adding the entropy term to the objective which they mention in the paper as being useful for improving exploration.

Sep 10 '16 02:09 steveKapturowski

async-rl async-rl copied to clipboard

Stop actor gradient flowing through the critic

async-rl
async-rl copied to clipboard