Youtube-Code-Repository A2C with experience replay

A2C with experience replay

Open aivanni opened this issue 3 years ago • 0 comments

Hello @philtabor ,

When you attempt to use experience replay in actor critic setting, to me it looks that only critic part is trained (gradients propagated), but the actor part that comes from stored log_probs in numpy array cannot back propagate gradients. However, imho the actual problem is more general, since policy is something that supposed to be evolving it does not make sense to store results of older worse policy. log_probs need to be recomputed in learning function the same way as outputs of critic network.

Aug 04 '20 09:08 aivanni

Youtube-Code-Repository Youtube-Code-Repository copied to clipboard

A2C with experience replay

Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard