examples
examples copied to clipboard
How to run rpc/pipeline /main.py on two physical machines?
I want to run the Resnet on two different machines , how to run the main.py When i change the code by add the follow `# on rank 0 dist.init_process_group( backend = "gloo", init_method = 'tcp://172.16.8.196:8864', rank = 0, world_size = 2 )
on rank 1
dist.init_process_group( backend = "gloo", init_method = 'tcp://172.16.8.196:8864', rank = 1, world_size = 2 )` In machine 1/2, the command is python main.py Then an error occurs, RuntimeError: Socket Timeout. How to fix it ?