RobustDet
RobustDet copied to clipboard
Unexpected results when training with multi-gpus
The code can be run properly with a single GPU. But when I applied the option multi-gpus=True, the DataParallel used in the code went wrong. To be specific, the hook interface is not safe with multi-threads. Then I remove the forward_pre_hook and add the preprocess function to the dynamic layers manually. In this case, the training can be run without error. The final results(i.e., AP and mAP), however, are terrible (0.015), which indicates that something still went wrong during the training process. I guess that the author was training on one GPU and haven't tested the multi-gpus option, but one GPU with 16GB memory is not enough for batch_size = 32 (even 16). It will be of great help to me if anyone could solve this problem.
The bug of multi-gpus training has been fixed in the latest version. You can update the code to the latest version.
The bug of multi-gpus training has been fixed in the latest version. You can update the code to the latest version.
Thanks for your response. Training without CFR works fine now, but errors still occurred when training with both CFR and multi-GPU. I will be really grateful if this bug could be fixed too.