RobustDet icon indicating copy to clipboard operation
RobustDet copied to clipboard

Unexpected results when training with multi-gpus

Open beautyremain opened this issue 2 years ago • 2 comments

The code can be run properly with a single GPU. But when I applied the option multi-gpus=True, the DataParallel used in the code went wrong. To be specific, the hook interface is not safe with multi-threads. Then I remove the forward_pre_hook and add the preprocess function to the dynamic layers manually. In this case, the training can be run without error. The final results(i.e., AP and mAP), however, are terrible (0.015), which indicates that something still went wrong during the training process. I guess that the author was training on one GPU and haven't tested the multi-gpus option, but one GPU with 16GB memory is not enough for batch_size = 32 (even 16). It will be of great help to me if anyone could solve this problem.

beautyremain avatar Aug 31 '22 02:08 beautyremain

The bug of multi-gpus training has been fixed in the latest version. You can update the code to the latest version.

IrisRainbowNeko avatar Aug 31 '22 02:08 IrisRainbowNeko

The bug of multi-gpus training has been fixed in the latest version. You can update the code to the latest version.

Thanks for your response. Training without CFR works fine now, but errors still occurred when training with both CFR and multi-GPU. I will be really grateful if this bug could be fixed too.

beautyremain avatar Sep 02 '22 03:09 beautyremain