visualDet3D icon indicating copy to clipboard operation
visualDet3D copied to clipboard

multi-gpu training

Open slinghe0321 opened this issue 5 years ago • 5 comments

Hi, thanks for your great work! I have trained GroundAwareYolo3D model and get results as below: Car AP(Average Precision)@0.70, 0.70, 0.70 bbox AP: 97.29, 84.55, 64.65 bev AP: 29.53, 20.15, 15.53 3d AP: 22.90, 15.26, 11.33 aos AP: 96.52, 82.52, 63.05

seems comparable with paper report (23.63 16.16 12.06) in Car [email protected] validation set.

However if training with multi-gpu e.g. 4-GPU, we get poor result as below: Car AP(Average Precision)@0.70, 0.70, 0.70 bbox AP: 97.08, 86.41, 66.67 bev AP: 20.56, 15.16, 11.22 3d AP: 15.17, 10.81, 8.22 aos AP: 95.50, 83.36, 64.24

training command: bash ./launchers/train.sh config/$CONFIG_FILE.py 0,1,2,3 multi-gpu-train bash ./launchers/train.sh config/$CONFIG_FILE.py 0 single-gpu-train

I trained twice with 'multi-gpu' and both results are similar and lower than 'single-gpu', so do you have some suggestions about this case? What about your multi-gpu training performance?

slinghe0321 avatar Mar 22 '21 08:03 slinghe0321

I also notice this. I consider this a bug.

I guess the problem is that multi-GPU training changes the relative weights between batches (batches on different GPUs are simply averaged while batches on the same GPU weight depending on num_gt, and some batches are skipped).

I have not tested to debug this, because I am not that familiar with APIs on multi-GPUs training.

Owen-Liuyuxuan avatar Mar 22 '21 10:03 Owen-Liuyuxuan

I changed

weighted_regression_losses = torch.sum(weights * reg_loss / (torch.sum(weights) + 1e-6), dim=0)

into

weight_sum = torch.sum(weights)
if torch.distributed.is_initialized():
    N = torch.distributed.get_world_size()
    torch.distributed.all_reduce(weight_sum)
    reg_loss = reg_loss * N
weighted_regression_losses = torch.sum(weights * reg_loss / (weight_sum + 1e-6), dim=0)

and half the batch size, Empirically, the gap gets smaller, but the gap still exists

Owen-Liuyuxuan avatar Mar 25 '21 03:03 Owen-Liuyuxuan

请问multi-gpu会对mono_depth的训练产生影响吗?

cnexah avatar May 17 '21 16:05 cnexah

请问multi-gpu会对mono_depth的训练产生影响吗?

In my test, depth prediction is fine with multi-gpu

Owen-Liuyuxuan avatar Jun 02 '21 02:06 Owen-Liuyuxuan

For now, in the new update, with the distributed sampler from detectron2, we are able to train with multi-GPU and obtain reasonable performance.

Without tuning the learning rate and batch size, the result goes like this:

Car AP(Average Precision)@0.70, 0.70, 0.70:                                                                                                                                                                        
bbox AP:97.24, 86.90, 67.03                                                                                                                                                                                        
bev  AP:29.68, 20.48, 15.73                                                                                                                                                                                        
3d   AP:21.56, 15.00, 11.16                                                                                                                                                                                        
aos  AP:96.23, 84.25, 64.92                                                                                                                                                                                        
Car AP(Average Precision)@0.70, 0.50, 0.50:                                                                                                                                                                        
bbox AP:97.24, 86.90, 67.03                                                                                                                                                                                        
bev  AP:65.20, 46.35, 35.98                                                                                                                                                                                        
3d   AP:58.84, 41.06, 32.49                                                                                                                                                                                        
aos  AP:96.23, 84.25, 64.92

Owen-Liuyuxuan avatar Jul 19 '22 02:07 Owen-Liuyuxuan