keras-yolo3 icon indicating copy to clipboard operation
keras-yolo3 copied to clipboard

Low GPU utilization while training

Open HusainKapadia opened this issue 5 years ago • 3 comments

I trained the Tiny YOLO model for my own dataset and things worked out pretty well for me. Thank you for your implementation. Now, I am trying to optimize the performance while training. I noticed that the model loads on my GPU but the GPU utilization is not constant. It displays a spiking behaviour where most of it is 0% and it peaks up to 20%. Thus, it is not using the GPU very efficiently. I am suspecting that the data augmentation process implemented on every batch is the bottleneck since that part runs on the CPU. I also read some issues on other repositories where people complained about the use of the 'fit_generator' module that it slows down the process.

My system specifications are:

  1. GPU: RTX 2070
  2. OS: Windows 10
  3. Python Version: 3.6.8
  4. Tensorflow: 1.15.0 tensorflow-gpu

@qqwweee did you notice a similar behaviour while training your system? Can you please provide some details of the machine performance during training? Would also love to know your thoughts about what the bottleneck could be?

HusainKapadia avatar Feb 10 '20 10:02 HusainKapadia

Hi Husain,

did you find a solution? I have the same problem with this repository

Kind regards

NorbertDorbert avatar Jul 07 '20 10:07 NorbertDorbert

Hi @HusainKapadia , @NorbertDorbert

Did you guys come up with a solution? I have the same problem, I'm using 2080 Ti and my Cuda usage stays mostly between 0-20%. I also think it could be to do with fit_generator. I think the spikes in Cuda usage happen because fit_generator has to load the images for every batch every so often which causes the spikes.

I tried adding num_workers=2, but get an error in Keras when I do that: your generator is not thread safe

mazatov avatar Jul 27 '20 16:07 mazatov

Hi, @HusainKapadia, @NorbertDorbert, @mazatov Did you find the solution for this problem? I met up with the same problem when using this repo, I'm using 1080 Ti and the training speed is really slow. I noticed the cuda usage is almost 0% and only use about 150 MB GPU memory. After setting num_worker to 8 and use_multiprocessing to be True, the speed is still slow.

wqrray avatar Jun 25 '21 08:06 wqrray