understanding-ai Asynchronous Methods for Deep Reinforcement Learning

Asynchronous Methods for Deep Reinforcement Learning

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1602.01783 aka A3C by Google

This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C A3C is best

(image originally from openresearch.ai)

A3C is on-policy method (compare to Q-Learning is off-policy)

Loss = Policy Loss + 0.5 * Value Loss

\pi (x) has (typically) one softmax output for the policy with convolution network

one linear output for value function V with non-output layers shared

Mar 19 '18 11:03 flrngel