torchpack
torchpack copied to clipboard
Multi Node training
Can you suggest how to implement multi gpu - multi node training with torchpack ?
I have set -H ip1:gpus,ip2:gpus
and launched the train from both the nodes, however they don't seem to be getting a handle of one another. What am I missing here ?
Could you try to SSH into ip1
and ip2
? You need to make sure that these two machines can be SSH-ed into without password.