darknet icon indicating copy to clipboard operation
darknet copied to clipboard

How do I use 2 gpus to train the same yolov3 model?

Open thimabru1010 opened this issue 4 years ago • 7 comments

I'm having trouble training my yolov3 in my PC, which has 2 gpus of 8 GB. The problem is I can only train it setting Batch = 64 and Subdivisions=64. Any other value for subdivisions (32, 16, 8) generates an error: CUDA out of memory. The problem is I have 2 gpus and the program doesn't recognize the excess and allocate to the other gpu even setting flags -gpus 0,1 and compiling with CUDA. I want to lower subdivision to get a bigger accuracy. How do I use both gpus to train the same yolov3 model?

thimabru1010 avatar Sep 30 '19 16:09 thimabru1010

What I want to mean: https://stackoverflow.com/questions/36313934/is-it-possible-to-split-a-network-across-multiple-gpus-in-tensorflow

Here is an example using Tensorflow in python

thimabru1010 avatar Sep 30 '19 17:09 thimabru1010

You can use 2-4x GPUs to train 2x-4x faster, so after each iteration GPUs will be synchronized.

No, you cant use lower subdivisions. Because GPU<->GPU PCI-express interconnect (~16 GB/sec) is much slower than GPU-VRAM (500 GB/sec), so training speed would be 10x-100x slower.

AlexeyAB avatar Sep 30 '19 17:09 AlexeyAB

But I don't want to train faster. I just want to allocate more memory.

thimabru1010 avatar Sep 30 '19 18:09 thimabru1010

Do you want to increase mini_batch size 2x and decrease performance 100x?

AlexeyAB avatar Sep 30 '19 18:09 AlexeyAB

Not decrease performance, just want more memory to train with less Subdivisions. But CUDA always runs out of memory with subs<64

thimabru1010 avatar Oct 01 '19 19:10 thimabru1010

So is it possible to split the network between two GPUs, without worrying about the speed, if I just want to have a larger minibatch? Currently I am only able to fit one image in a minibatch, But with very little resolutions, the objects I want to identify are small so they lose features at lower resolution.

yogesh-bansal avatar Jun 08 '20 07:06 yogesh-bansal

I'm having trouble training my yolov3 in my PC, which has 2 gpus of 8 GB. The problem is I can only train it setting Batch = 64 and Subdivisions=64. Any other value for subdivisions (32, 16, 8) generates an error: CUDA out of memory. The problem is I have 2 gpus and the program doesn't recognize the excess and allocate to the other gpu even setting flags -gpus 0,1 and compiling with CUDA. I want to lower subdivision to get a bigger accuracy. How do I use both gpus to train the same yolov3 model?

Please how did you resolve this issue. I am currently having same challenge

kaypositive avatar May 04 '23 15:05 kaypositive