CornerNet-Lite Question about training CornerNet-squeeze on Tesla v-100

Question about training CornerNet-squeeze on Tesla v-100

Open calvindu95 opened this issue 5 years ago • 0 comments

When using 4 or more GPU(tesla v100) to train the model, they seem lower than only using one or two: 1, using 22080Ti with batch-size 24 and chunksize[12,12] is the fastest, 1.22 it/s 2, using 1v100 with batch-size 16 seems to double the training time, and the Memory-Usage of GPU is quite low, 2.41s/it 3, So I tried batch-size 128 with chunk[32,32,32,32], it turned to be even lower than using only 1 v100, and the GPU-util is very low, 6.75s/it 4, batch-size 320 with chunk-size [40,40,40,40,40,40,40,40] turned out to be the lowest, even this enjoys high Memory-Usage of GPUm, GPU-util is the lowest.

It seems that the problem happened with the periods when the CPU load the data(correct me if I am wrong). So I wonder what is the suggested config for tesla v-100? Also, I found it might be helpful using DataLoader. Is it possible for me to use this method on the CornerNet-Lite?

Nov 19 '19 02:11 calvindu95

CornerNet-Lite CornerNet-Lite copied to clipboard

Question about training CornerNet-squeeze on Tesla v-100

CornerNet-Lite
CornerNet-Lite copied to clipboard