generative_inpainting
generative_inpainting copied to clipboard
Questions about multi-gpu training
It's a great work. Does this code support multi-gpu training? I've tried to alter NUM_GPUS and GPU_ID, but it seems like that the code just selects one gpu for training. Is there any clue about it? Thanks.
To enable multi-GPU training, you will need to change this line to MultiGPUTrainer. Expect some adventures when using multi-GPU for this project. I am not sure about the behavior.
@RyanHTR Hello, RyanHTR, can you train the network successfully on multi-GPU?
@RyanHTR I changed this line to MultiGPUTrainer. But I got an error "TypeError: 'NoneType' object is not callable" which I can't figure it out. Do you have this problem?
@JiahuiYu There is a bug for 'NoneType object is not callable' None()
@1900zyh This is not bug. Loss should be None for multi-GPU training.
@JiahuiYu I think it should be
assert loss is None, 'For multigpu training, graph_def should be provided, instead of loss.'
Or it will report TypeError
@1900zyh Ohhhh I see. Thank you!
I have 4 GTX 1080Ti GPUs and each gpu can handle batch size of 16 that means if I use all the gpus I can change batch size to 64. But when I do that my GPUs ran out of memory.
Am assuming here that ng.train.MultiGPUTrainer
uses data parallelism to split input data (64 batch size) in to 4 gpus where each gpu gets 16 batch of images.
Because of that Issue I can only train on batch size of 16, whether I use 4 gpus or 1 gpu. What are your thoughts about this?
@bis-carbon The batch size here is the per-gpu batch size.
Thank you for your quick response and great work.
@1900zyh @bis-carbon @lipanpeng Hi. Have you figured out the issues that how to use multi gpu for training. If so, kinldy let me know, I am struggling. Thanks in advance