SA-SSD
SA-SSD copied to clipboard
How to improve validation mAP using multi-GPUs
I trained the model with 8GPUs, with your defaut hyper-parameters, however, the mAP is lower than single GPU. Car [email protected], 0.70, 0.70: bbox AP:95.64, 82.39, 77.33 bev AP:92.46, 79.22, 72.20 3d AP:79.55, 63.29, 56.62 aos AP:93.19, 79.68, 74.29 Car [email protected], 0.50, 0.50: bbox AP:95.64, 82.39, 77.33 bev AP:96.21, 87.74, 82.84 3d AP:96.09, 87.34, 80.60 aos AP:93.19, 79.68, 74.29
Could you give some suggestions for improving mAP using multi-GPUs? Thanks.
@zyms5244 any progress on multigpu case?
@zyms5244 any progress on multigpu case?
Nothing. But I found the more GPUs are used, the more obvious the performance degradation is, at the same iterations. @skyhehe123 Could you check multi-GPUs training?
I also tried SDG and cosine lr policy as previous commit, no improvement
Do you try increase LR as squared root of increasing BS? say, with lr 0.001 and BS=2 (1 gpu) i set LR for 6 gpu (total bs=12) as 0.001*(12/2)**0.5 = 0.0025
also in recently papers i see linear increase, not squared root -> i.e. 0.006 for example above. Try this both methods for params you use and tell us result pls.
But I found the more GPUs are used, the more obvious the performance degradation is, at the same iterations.
say "same iterations" you mean same number of epoch? i.e. 50 epochs with 1 gpu better than 50 epochs on 4 gpu?
But I found the more GPUs are used, the more obvious the performance degradation is, at the same iterations.
say "same iterations" you mean same number of epoch? i.e. 50 epochs with 1 gpu better than 50 epochs on 4 gpu? I tried different epochs and GPUs: 1 GPU 80 epochs > 4 GPUs 80 epochs > 4s GPU 20 epochs > 8GPUs 10 epochs. with LR linear increase. I'll try squared root later and share yours.
I have tried different lr for one cycle case. Increasing Lr will make the problem even worse. obviously, the training process had not converged yet(validation results are still improving). My current suggestion will be:
- adjust lr by hand with step scheduler instead of using one cycle
- increase epoch num
- adjust the rate of 1st and 2nd stage from 0.4 : 0.6 to 0.3:0.7
All the work above are wrong. But the converge issue is obvious
with 2-gpu training, I got almost zero accuracy. it is very wired, is there any suggestions? Car [email protected], 0.70, 0.70: bbox AP:0.03, 0.04, 0.07 bev AP:0.00, 0.00, 0.00 3d AP:0.00, 0.00, 0.00 aos AP:0.01, 0.02, 0.03 Car [email protected], 0.50, 0.50: bbox AP:0.03, 0.04, 0.07 bev AP:0.02, 0.02, 0.03 3d AP:0.00, 0.00, 0.01 aos AP:0.01, 0.02, 0.03
Hi,@Yamin05114 .Thank you for your test and post here.Do you reproduce the result of paper?if can,can you pose your lib version and other details for reproduce the result?thank you. I use 4*2080Ti,the result only 79 on moderate and test a lot.thank you very much!