UniverseNet
UniverseNet copied to clipboard
the training duration of relationnet ++
Hello, what's the training duration of relationnet ++? Why does it take me so much time to train with a single GPU on the coco dataset??
2021-05-08 14:40:36,476 - mmdet - INFO - workflow: [('train', 1)], max: 20 epochs 2021-05-08 14:43:38,226 - mmdet - INFO - Epoch [1][50/58633] lr: 9.890e-04, eta: 49 days, 7:57:52, time: 3.635, data_time: 0.054, memory: 9028, kpt_loss_point_cls: 1.1461, kpt_loss_point_offset: 0.0875, bbox_loss_cls: 1.2137, bbox_loss_bbox: 0.7130, loss: 3.1603 2021-05-08 14:47:02,436 - mmdet - INFO - Epoch [1][100/58633] lr: 1.988e-03, eta: 52 days, 9:05:35, time: 4.084, data_time: 0.006, memory: 9566, kpt_loss_point_cls: 1.1375, kpt_loss_point_offset: 0.0869, bbox_loss_cls: 1.2221, bbox_loss_bbox: 0.7509, loss: 3.1974 2021-05-08 14:50:23,529 - mmdet - INFO - Epoch [1][150/58633] lr: 2.987e-03, eta: 53 days, 2:39:41, time: 4.022, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.1513, kpt_loss_point_offset: 0.0866, bbox_loss_cls: 1.2318, bbox_loss_bbox: 0.7351, loss: 3.2049 2021-05-08 14:53:27,445 - mmdet - INFO - Epoch [1][200/58633] lr: 3.986e-03, eta: 52 days, 7:26:42, time: 3.678, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.1245, kpt_loss_point_offset: 0.0859, bbox_loss_cls: 1.1802, bbox_loss_bbox: 0.6615, loss: 3.0520 2021-05-08 14:56:36,176 - mmdet - INFO - Epoch [1][250/58633] lr: 4.985e-03, eta: 52 days, 2:10:09, time: 3.775, data_time: 0.005, memory: 9566, kpt_loss_point_cls: 1.0461, kpt_loss_point_offset: 0.0853, bbox_loss_cls: 1.2102, bbox_loss_bbox: 0.7289, loss: 3.0704
Which config did you use?
i use is the bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py
python tools/train.py configs/bvr/bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py
The config has many heavy settings.
Please try the following:
Res2Net-50
or Res2Net-101
stage_with_dcn=(False, False, False, True),
../_base_/datasets/coco_detection_mstrain_480_960.py
with_cp=False
fp16 = dict(loss_scale='dynamic')
or fp16 = dict(loss_scale=512.)
Even only for inference, RelationNet++ is slow on T4. I may verify the paper's FPS by benchmarks on V100.
Thank you for your reply. I used resnet-50 for training, and the speed has been improved obviously, but the accuracy is not as high as that mentioned in the article. Is that the reason for epoch = 12???
If you use bvr_retinanet_r50_fpn_gn_1x_coco.py, an AP around 38.5 (the authors' result) is appropriate. The settings I recommended and training for 20 epochs will boost accuracy.
Please don't forget to change the learning rate according to the Linear Scaling Rule.
lr=0.01 for total batch size 16 (8 GPUs * 2 samples_per_gpu)
lr=0.00125 for total batch size 2 (1 GPU * 2 samples_per_gpu)
Thank you very much for your reply. I will try it
The config has many heavy settings.
Please try the following:
Res2Net-50
orRes2Net-101
stage_with_dcn=(False, False, False, True),
../_base_/datasets/coco_detection_mstrain_480_960.py
with_cp=False
fp16 = dict(loss_scale='dynamic')
orfp16 = dict(loss_scale=512.)
Hello, will the accuracy be affected after the above modification??
Res2Net-50
or Res2Net-101
affect accuracy.
stage_with_dcn=(False, False, False, True),
affects accuracy.
../_base_/datasets/coco_detection_mstrain_480_960.py
affects accuracy.
with_cp=False
should not affect accuracy.
fp16 = dict(loss_scale='dynamic')
or fp16 = dict(loss_scale=512.)
are expected not to affect accuracy.