PyTorch-YOLOv3 icon indicating copy to clipboard operation
PyTorch-YOLOv3 copied to clipboard

Training is slow on Google Cloud Nvidia Tesla P100

Open Muthu2093 opened this issue 6 years ago • 7 comments

I am running my model on Google Cloud with 8vCPUs and 52GB RAM and 1GPU - Nvidia Tesla P100 with a batch size of 16. But, It is taking around 8 hours for one epoch from what I calculated.

I am running the training on COCO dataset.

!!! torch.cuda.is_available shows True for me when I print it to console..

Could someone tell me where I might be going wrong ?

Muthu2093 avatar Oct 20 '18 20:10 Muthu2093

Your GCP hard drive is IO constrained. Switch to SSD and/or increase size.

glenn-jocher avatar Oct 21 '18 12:10 glenn-jocher

My instance has 50GB SSD Persistent Disk Already. Do I need more?

I am mounting the resources (code+dataset) from a Google Cloud Bucket. Will it affect the training speed?

Muthu2093 avatar Oct 22 '18 03:10 Muthu2093

Same Issue! Machine : 8 * 2080ti, 96GB DDR4 ,1TB SSD, Xeon 6134, 3.2G,4 Core * 2 Train Data set : VOC, 10k imgs Batch size : 12 * 8 1 epoch cost: 14 mins! It seems to create threads every step. and load data. gpus works 0.5sec and wait 3secs.

please share speed up tips if you have anyone. Thanks.

tensmyo avatar Jan 28 '19 01:01 tensmyo

Hi, there have been improvements to the dataloader during the past week or so and I have measured a significant speedup. One epoch for me (with a 2080ti) takes approximately one hour now, with a 8 sample batch size.

eriklindernoren avatar Apr 27 '19 20:04 eriklindernoren

Hi, there have been improvements to the dataloader during the past week or so and I have measured a significant speedup. One epoch for me (with a 2080ti) takes approximately one hour now, with a 8 sample batch size.

It still seems to create threads every step. and load data. gpus works 0.5sec and wait 3secs. Volatile GPU-Util is always changing. and I use 16 n_cpu, they are all 100% even more.

CuiHaoran98 avatar Dec 19 '19 11:12 CuiHaoran98

Hi, there have been improvements to the dataloader during the past week or so and I have measured a significant speedup. One epoch for me (with a 2080ti) takes approximately one hour now, with a 8 sample batch size.

It still seems to create threads every step. and load data. gpus works 0.5sec and wait 3secs. Volatile GPU-Util is always changing. and I use 16 n_cpu, they are all 100% even more.

Hi,did your question solved? my training speed is too slow,and gpu is Tesla T4,but it appears that the speed of it is as same as GTX 1050

948024326 avatar Jan 28 '21 13:01 948024326

Is this issue still relevant/occurring?

Flova avatar Sep 14 '21 09:09 Flova