pytorch-operator icon indicating copy to clipboard operation
pytorch-operator copied to clipboard

whether multi-gpu-per-pod setup be supported in PytorchJob

Open tingweiwu opened this issue 4 years ago • 1 comments

If there is 2GPU per node, how to set the Worker spec In the PytorchJob 1 replicas with 2GPU per pod or 2 replicas with only 1GPU per pod?

I've seen similar issues: #219 , but there is no clear instrunctions on whether multi-gpu-per-pod setup be supported in PytorchJob ?

is pytorch-operator designed for 1-gpu-per-pod setup even through there is multi-gpu on the same node?

will multi-gpu-per-pod setup be supported ?

tingweiwu avatar Apr 25 '21 08:04 tingweiwu

Hey @tingweiwu ,

Did you ever get this sorted? I am struggling with the same issue.

wallarug avatar Nov 19 '21 11:11 wallarug