examples icon indicating copy to clipboard operation
examples copied to clipboard

How to run rpc/pipeline /main.py on two physical machines?

Open Unknown-Body opened this issue 2 years ago • 0 comments

I want to run the Resnet on two different machines , how to run the main.py When i change the code by add the follow `# on rank 0 dist.init_process_group( backend = "gloo", init_method = 'tcp://172.16.8.196:8864', rank = 0, world_size = 2 )

on rank 1

dist.init_process_group( backend = "gloo", init_method = 'tcp://172.16.8.196:8864', rank = 1, world_size = 2 )` In machine 1/2, the command is python main.py Then an error occurs, RuntimeError: Socket Timeout. How to fix it ?

Unknown-Body avatar May 18 '23 10:05 Unknown-Body