genrl icon indicating copy to clipboard operation
genrl copied to clipboard

RPC Communication in Distributed RL Training

Open Sharad24 opened this issue 5 years ago • 2 comments
trafficstars

There's three ways that I can think of having distributed training:

  1. Use of Pytorch's Distributed Training infrastructure. Would require establishing communication protocols specific to the case of Deep RL. This would all be in Python (most likely) unless we find a way around.
  2. Use of Reverb
    • Use TF based Datasets (@threewisemonkeys-as )
    • Pytorch wrapper for the conversion of NumPy arrays, etc (that are received) (Short-term, up for grabs)

Sharad24 avatar Aug 31 '20 21:08 Sharad24

I agree that we should target 2 to begin with. We will still need python multiprocessing over here to run actors and learners seperately right?

As for the structure and fitting it into the rest of the library I was thinking of having DistributedOnPolicyTrainer and DistributedOffPolicyTrainer which will act as the main process and spawn the multile actors while maintaining and updating the central weights. In this case, the agent would only need to implement update_params (to be called in the main process) and select_action (to be called for each actor). The trajectories and weights would be transported through reverb.

I am holding off on #233 since a reverb buffer wrapper would heavily depend on the structure we go with. Plus it is not really useful in the non-distributed case.

threewisemonkeys-as avatar Sep 01 '20 19:09 threewisemonkeys-as

Stale issue message

github-actions[bot] avatar Nov 03 '20 00:11 github-actions[bot]