How to switch to distributed RPC
Thanks for providing this wonderful tool. You are suggesting switching to the new distributed RPC. In their documents, examples with only 2 workers are provided. But it is still not clear for implementing other distributed primitives (for example torch.distributed.all_reduce). Could you provide any hints or examples?
Sorry for the late reply. I have not looked into RPC in details so to be honest I am not sure it can do everything diffdist can. I am afraid at this moment I cannot help you. I have changed the README to better reflect that it is still possible to use this tool if it is needed. I will look into whether I can implement torch.distributed.all_reduce into diffdist