improved-diffusion ignoring second gpu

Wether I'm using mpi or not, if I have 2 gpus with one node, it will always ignore the second gpu. I am unable to use multiple gpus. How do I make it use multiple gpus?

Jul 22 '22 21:07 AlexiaJM

Even after you tried mpiexec -n 2?

Jul 23 '22 16:07 taoisu

I was able to make it work on 2 nodes with one gpu each, by using mpiexec -n 2 on a bash file doing all the preprocessing and then running the python. If I run the mpiexec on the python file directly, I get problems because both nodes need the folder changes, cifar10 processng and whatnot.

I will do this for now, but do you know how to do with a single node and multi-gpus in SLURM? If I don't use mpiexec, it doesn't use the second GPU. If I use 'mpiexec -n 2', it says 'There are not enough slots available in the system to satisfy the 2 slots that were requested by the application' because it tries to do the parallelism on the nodes but there is only one. I would like to do parallelism on the cpus/gpus of the single node.

Jul 25 '22 14:07 AlexiaJM

For me I deleted all MPI related code and use torchrun because I found it generally easier to work with. (No slot or port or what not)

Jul 25 '22 18:07 taoisu

thanks, I might give it a try if I run into more issues!

Jul 26 '22 17:07 AlexiaJM

For me I deleted all MPI related code and use torchrun because I found it generally easier to work with. (No slot or port or what not)

你好。能不能看看你修改后的代码

Aug 17 '22 11:08 No360201

Hi, I use multi-gpus to train my model like mpiexec -n 8 python scripts/image_train.py. However, I encountered an issue that I couldn't load the ckpt to all gpus and it encountered OOM when I finetuned 512*512 model. But I didn't encountered this problem when I tried to train on one GPU. Could I ask what I should do？ I tried to print MPI.COMM_WORLD.size when I use mpiexec -n 8,but it printed out 1.

Aug 26 '22 16:08 YUHANG-Ma

thanks, I might give it a try if I run into more issues!

Could you run it successfully on SLURM with multiple GPUs? May I ask how?

Nov 10 '22 19:11 BaohaoLiao

For me I deleted all MPI related code and use torchrun because I found it generally easier to work with. (No slot or port or what not)

hello，if I use torchrun from your second branch（xiaguo/fsdp），I find it will not distribute trian_data to every gpu, it means every gpu use same train_data in training process. this is code as followed(train_utils.py）.

''''''''''''''''''''''''''''''''''''''''''''''''''''''' def forward_backward(self, batch, cond): for i in range(0, batch.shape[0], self.microbatch): micro = batch[i : i + self.microbatch].to(dist_util.dev()) micro_cond = { k: v[i : i + self.microbatch].to(dist_util.dev()) for k, v in cond.items() } last_batch = (i + self.microbatch) >= batch.shape[0] t, weights = self.schedule_sampler.sample(micro.shape[0], dist_util.dev()) ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' whether it uses same train_data or not in every gpu(process)? if true, add distributed.sampler? Thanks!

Oct 26 '23 13:10 99-WSJ

improved-diffusion improved-diffusion copied to clipboard

ignoring second gpu

improved-diffusion
improved-diffusion copied to clipboard