diffdist icon indicating copy to clipboard operation
diffdist copied to clipboard

How to switch to distributed RPC

Open ychfan opened this issue 5 years ago • 1 comments

Thanks for providing this wonderful tool. You are suggesting switching to the new distributed RPC. In their documents, examples with only 2 workers are provided. But it is still not clear for implementing other distributed primitives (for example torch.distributed.all_reduce). Could you provide any hints or examples?

ychfan avatar May 09 '20 17:05 ychfan

Sorry for the late reply. I have not looked into RPC in details so to be honest I am not sure it can do everything diffdist can. I am afraid at this moment I cannot help you. I have changed the README to better reflect that it is still possible to use this tool if it is needed. I will look into whether I can implement torch.distributed.all_reduce into diffdist

ag14774 avatar Aug 12 '20 13:08 ag14774