darknet icon indicating copy to clipboard operation
darknet copied to clipboard

Training with multiple GPUs is not faster than 1 GPU???

Open aidevmin opened this issue 1 year ago • 2 comments

I follow the guide to train my dataset with multiple GPUs, I saw speed of 2 cases is same. I use the same config

batch=64
subdivisions=32     # 16 OOM
width=512
height=512
...
max_batches=10000

I check GPUs usage and almost GPUs ared used.

@AlexeyAB Could you help me? I use same batch, max_batches and subdivision for 1 GPU and multiple GPUs, but training time is same.

I read this issuse https://github.com/AlexeyAB/darknet/issues/1165 and @AlexeyAB you also commented to this issue.

As my understanding if we use 4 GPUs, we need to reduce max_batches 4 times (compared to the case with 1 GPU) to get better speed (because with more GPUs, more images will be processed in 1 iteration) and change lr, burnin if needed as follow https://github.com/AlexeyAB/darknet/tree/64efa721ede91cd8ccc18257f98eeba43b73a6af#how-to-train-with-multi-gpu. Is that right?

aidevmin avatar Aug 15 '23 05:08 aidevmin

AlexeyAB is no longer working on Darknet/YOLO.

You should see the FAQ. It has some information on speeding up training: https://www.ccoderun.ca/programming/darknet_faq/#time_to_train

Are you sure you are using the correct command? Post the command you are using to train on multiple GPUs.

stephanecharette avatar Aug 15 '23 05:08 stephanecharette

@stephanecharette Thanks for respone.

Here is command that I used for multiple GPUs training (after training with 1 GPU for some iterations and get weights)

./darknet detector train data/obj.data yolov4-custom.cfg backup/yolov4-custom_last.weights -gpus 0,1 -dont_show

aidevmin avatar Aug 15 '23 05:08 aidevmin