ConSinGAN
ConSinGAN copied to clipboard
Running On Multiple GPUs
Hi I am running the image harmonization part of the model with a --train_stages 6 --max_size 350 and --lr_scale 0.5 to increase the quality of the images.
However, once I get to the 2 stage of the training, it crashes because of lack of CUDA memory. I altered the torch device for the model to accept more than 1 gpu (let's say gpus 0 and 1) and made changes to the model to be encapsulated in a DataParallel model so that it can run parallel on multiple GPUs. However, it still only runs on 1 GPU.
Do you have any suggestions to fix this issue?
Without seeing the code it's difficult.
Have you changed how the parameter --gpu
is handled (in the main_train.py file)?
By default it's set to 0 and later in the code we do (see here)
if torch.cuda.is_available(): torch.cuda.set_device(opt.gpu)
You might have to change that to get it to work.
Hi, still having the problem. I changed that line to equal a torch device of two gpus (passed in set_device). It still runs on 1 gpu.
Sorry for the late response. What kind of GPU are you running this on and how much VRAM does it have? I run all of my experiments on a single GPU with ~12GB VRAM without problems.
GPU 0 with about 30000 MiB
I have the same problem. Is there any solution?