generative_inpainting icon indicating copy to clipboard operation
generative_inpainting copied to clipboard

Questions about multi-gpu training

Open RyanHTR opened this issue 6 years ago • 11 comments

It's a great work. Does this code support multi-gpu training? I've tried to alter NUM_GPUS and GPU_ID, but it seems like that the code just selects one gpu for training. Is there any clue about it? Thanks.

RyanHTR avatar Apr 16 '18 13:04 RyanHTR

To enable multi-GPU training, you will need to change this line to MultiGPUTrainer. Expect some adventures when using multi-GPU for this project. I am not sure about the behavior.

JiahuiYu avatar Apr 16 '18 17:04 JiahuiYu

@RyanHTR Hello, RyanHTR, can you train the network successfully on multi-GPU?

zhiweige avatar Apr 25 '18 07:04 zhiweige

@RyanHTR I changed this line to MultiGPUTrainer. But I got an error "TypeError: 'NoneType' object is not callable" which I can't figure it out. Do you have this problem?

lipanpeng avatar May 07 '18 03:05 lipanpeng

@JiahuiYu There is a bug for 'NoneType object is not callable' None()

zengyh1900 avatar Jun 28 '18 13:06 zengyh1900

@1900zyh This is not bug. Loss should be None for multi-GPU training.

JiahuiYu avatar Jun 28 '18 15:06 JiahuiYu

@JiahuiYu I think it should be assert loss is None, 'For multigpu training, graph_def should be provided, instead of loss.' Or it will report TypeError

zengyh1900 avatar Jun 28 '18 15:06 zengyh1900

@1900zyh Ohhhh I see. Thank you!

JiahuiYu avatar Jun 28 '18 18:06 JiahuiYu

I have 4 GTX 1080Ti GPUs and each gpu can handle batch size of 16 that means if I use all the gpus I can change batch size to 64. But when I do that my GPUs ran out of memory. Am assuming here that ng.train.MultiGPUTrainer uses data parallelism to split input data (64 batch size) in to 4 gpus where each gpu gets 16 batch of images.

Because of that Issue I can only train on batch size of 16, whether I use 4 gpus or 1 gpu. What are your thoughts about this?

bis-carbon avatar Apr 04 '19 18:04 bis-carbon

@bis-carbon The batch size here is the per-gpu batch size.

JiahuiYu avatar Apr 04 '19 19:04 JiahuiYu

Thank you for your quick response and great work.

bis-carbon avatar Apr 04 '19 20:04 bis-carbon

@1900zyh @bis-carbon @lipanpeng Hi. Have you figured out the issues that how to use multi gpu for training. If so, kinldy let me know, I am struggling. Thanks in advance

Adhiyaman-Manickam avatar Nov 11 '19 04:11 Adhiyaman-Manickam