hindsight-experience-replay
hindsight-experience-replay copied to clipboard
All process update the network and then sync the grad?
Hi, I have the doubt. In this distributed RL, because of OS scheduling, every process will have the near state and do the same thing? so all process will update the network and then sync the grad together? like sync algorithm A2C? thanks very much.
Yes, it can be refereed as using a very large batch size for the training.