Semi-supervised-learning
Semi-supervised-learning copied to clipboard
About Multi-process Distributed Training
Following your previous suggestions, I set up distributed computing during training and executed the command as shown in the figure below.
The corresponding settings in the config.yaml remained unchanged. However, the following situation occurred: the 0th GPU is always occupied by several extra processes. I tried debugging but did not find any clues. Is there anyone can give some suggestions? Because the extra occupation of the 0th GPU may cause an error: CUDA out of memory.
It is evident that this phenomenon is also related to the number of GPUs.