understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

Asynchronous Methods for Deep Reinforcement Learning

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1602.01783 aka A3C by Google

This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C A3C is best

image (image originally from openresearch.ai)

A3C is on-policy method (compare to Q-Learning is off-policy) image

Loss = Policy Loss + 0.5 * Value Loss image image

\pi (x) has (typically) one softmax output for the policy with convolution network

one linear output for value function V with non-output layers shared

flrngel avatar Mar 19 '18 11:03 flrngel