deep-blind-watermark-removal Model unable to run on multiple GPUs

Model unable to run on multiple GPUs

Open shashankvasisht opened this issue 3 years ago • 2 comments

I have multiple GPUs and I would Like to train the model on them for faster training. I see that you have already implemented MultiGPU training by using nn.DataParallel. There were some bugs in VX.py which were solved after i converted "self.model" to "self.model.module".

Yet after ensuring that i am using "CUDA_VISIBLE_DEVICES=0,1" i still see only GPU0 's memory to be filled and not GPU1's.

The model gives cuda out of memory if i try to use a input size >=512 with a batchsize of 12 or even 8.

Any idea why is it only consuming 1 of the 2 GPUs??

Thanks

Jan 12 '22 00:01 shashankvasisht

@vinthony Any updates on this?

Mar 18 '22 18:03 shashankvasisht

Currently, our method might work on multiple GPUs. I am sorry i can not test your question because I do not have multiple GPU envs currently.

Basically, we have implemented some code here to support multiple GPUs: https://github.com/vinthony/deep-blind-watermark-removal/blob/d238edfd931abe2ddfbef5ca1fbef3c551969f47/scripts/machines/BasicMachine.py#L68

you may refer to some detail here for debugging: https://stackoverflow.com/questions/54216920/how-to-use-multiple-gpus-in-pytorch

Mar 19 '22 01:03 vinthony

deep-blind-watermark-removal deep-blind-watermark-removal copied to clipboard

Model unable to run on multiple GPUs

deep-blind-watermark-removal
deep-blind-watermark-removal copied to clipboard