SA-SSD icon indicating copy to clipboard operation
SA-SSD copied to clipboard

How to improve validation mAP using multi-GPUs

Open zyms5244 opened this issue 4 years ago • 11 comments

I trained the model with 8GPUs, with your defaut hyper-parameters, however, the mAP is lower than single GPU. Car [email protected], 0.70, 0.70: bbox AP:95.64, 82.39, 77.33 bev AP:92.46, 79.22, 72.20 3d AP:79.55, 63.29, 56.62 aos AP:93.19, 79.68, 74.29 Car [email protected], 0.50, 0.50: bbox AP:95.64, 82.39, 77.33 bev AP:96.21, 87.74, 82.84 3d AP:96.09, 87.34, 80.60 aos AP:93.19, 79.68, 74.29

Could you give some suggestions for improving mAP using multi-GPUs? Thanks.

zyms5244 avatar Apr 23 '20 09:04 zyms5244

@zyms5244 any progress on multigpu case?

Yamin05114 avatar Jun 05 '20 08:06 Yamin05114

@zyms5244 any progress on multigpu case?

Nothing. But I found the more GPUs are used, the more obvious the performance degradation is, at the same iterations. @skyhehe123 Could you check multi-GPUs training?

zyms5244 avatar Jun 05 '20 09:06 zyms5244

I also tried SDG and cosine lr policy as previous commit, no improvement

zyms5244 avatar Jun 05 '20 09:06 zyms5244

Do you try increase LR as squared root of increasing BS? say, with lr 0.001 and BS=2 (1 gpu) i set LR for 6 gpu (total bs=12) as 0.001*(12/2)**0.5 = 0.0025

stalkermustang avatar Jun 05 '20 09:06 stalkermustang

also in recently papers i see linear increase, not squared root -> i.e. 0.006 for example above. Try this both methods for params you use and tell us result pls.

stalkermustang avatar Jun 05 '20 09:06 stalkermustang

But I found the more GPUs are used, the more obvious the performance degradation is, at the same iterations.

say "same iterations" you mean same number of epoch? i.e. 50 epochs with 1 gpu better than 50 epochs on 4 gpu?

stalkermustang avatar Jun 05 '20 09:06 stalkermustang

But I found the more GPUs are used, the more obvious the performance degradation is, at the same iterations.

say "same iterations" you mean same number of epoch? i.e. 50 epochs with 1 gpu better than 50 epochs on 4 gpu? I tried different epochs and GPUs: 1 GPU 80 epochs > 4 GPUs 80 epochs > 4s GPU 20 epochs > 8GPUs 10 epochs. with LR linear increase. I'll try squared root later and share yours.

zyms5244 avatar Jun 08 '20 04:06 zyms5244

I have tried different lr for one cycle case. Increasing Lr will make the problem even worse. obviously, the training process had not converged yet(validation results are still improving). My current suggestion will be:

  1. adjust lr by hand with step scheduler instead of using one cycle
  2. increase epoch num
  3. adjust the rate of 1st and 2nd stage from 0.4 : 0.6 to 0.3:0.7

Yamin05114 avatar Jun 08 '20 10:06 Yamin05114

All the work above are wrong. But the converge issue is obvious

Yamin05114 avatar Jun 10 '20 06:06 Yamin05114

with 2-gpu training, I got almost zero accuracy. it is very wired, is there any suggestions? Car [email protected], 0.70, 0.70: bbox AP:0.03, 0.04, 0.07 bev AP:0.00, 0.00, 0.00 3d AP:0.00, 0.00, 0.00 aos AP:0.01, 0.02, 0.03 Car [email protected], 0.50, 0.50: bbox AP:0.03, 0.04, 0.07 bev AP:0.02, 0.02, 0.03 3d AP:0.00, 0.00, 0.01 aos AP:0.01, 0.02, 0.03

lbsswu avatar Sep 10 '20 07:09 lbsswu

Hi,@Yamin05114 .Thank you for your test and post here.Do you reproduce the result of paper?if can,can you pose your lib version and other details for reproduce the result?thank you. I use 4*2080Ti,the result only 79 on moderate and test a lot.thank you very much!

vehxianfish avatar Apr 28 '21 01:04 vehxianfish