mmrotate
mmrotate copied to clipboard
mAP of s2anet under different batchsizes
I train the s2anet (fp16) in batchsize 2 and batchsize 8, and got a 3.7% difference on mAP. It's a little weird.
| batchsize | lr | mAP |
|---|---|---|
| 8 | 0.01 | 70.03 |
| 2 | 0.025 | 73.74 |
full config for bs8:
dataset_type = 'DOTADataset'
data_root = '/datasets/Dota_mmrotate/dota/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RResize', img_scale=(1024, 1024)),
dict(
type='RRandomFlip',
flip_ratio=[0.25, 0.25, 0.25],
direction=['horizontal', 'vertical', 'diagonal'],
version='le135'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=8,
workers_per_gpu=8,
train=dict(
type='DOTADataset',
ann_file='/datasets/Dota_mmrotate/dota/trainval/annfiles/',
img_prefix='/datasets/Dota_mmrotate/dota/trainval/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RResize', img_scale=(1024, 1024)),
dict(
type='RRandomFlip',
flip_ratio=[0.25, 0.25, 0.25],
direction=['horizontal', 'vertical', 'diagonal'],
version='le135'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
],
version='le135'),
val=dict(
type='DOTADataset',
ann_file='/datasets/Dota_mmrotate/dota/trainval/annfiles/',
img_prefix='/datasets/Dota_mmrotate/dota/trainval/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
],
version='le135'),
test=dict(
type='DOTADataset',
ann_file='/datasets/Dota_mmrotate/dota/test/images/',
img_prefix='/datasets/Dota_mmrotate/dota/test/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
],
version='le135'))
evaluation = dict(interval=12, metric='mAP', nproc=1)
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.3333333333333333,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=4)
log_config = dict(
interval=50,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = dict(loss_scale=dict(init_scale=512))
angle_version = 'le135'
model = dict(
type='S2ANet',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
zero_init_residual=False,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_input',
num_outs=5),
fam_head=dict(
type='RotatedRetinaHead',
num_classes=15,
in_channels=256,
stacked_convs=2,
feat_channels=256,
assign_by_circumhbbox=None,
anchor_generator=dict(
type='RotatedAnchorGenerator',
scales=[4],
ratios=[1.0],
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHAOBBoxCoder',
angle_range='le135',
norm_factor=1,
edge_swap=False,
proj_xy=True,
target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)),
align_cfgs=dict(
type='AlignConv',
kernel_size=3,
channels=256,
featmap_strides=[8, 16, 32, 64, 128]),
odm_head=dict(
type='ODMRefineHead',
num_classes=15,
in_channels=256,
stacked_convs=2,
feat_channels=256,
assign_by_circumhbbox=None,
anchor_generator=dict(
type='PseudoAnchorGenerator', strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHAOBBoxCoder',
angle_range='le135',
norm_factor=1,
edge_swap=False,
proj_xy=True,
target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)),
train_cfg=dict(
fam_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1,
iou_calculator=dict(type='RBboxOverlaps2D')),
allowed_border=-1,
pos_weight=-1,
debug=False),
odm_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1,
iou_calculator=dict(type='RBboxOverlaps2D')),
allowed_border=-1,
pos_weight=-1,
debug=False)),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(iou_thr=0.1),
max_per_img=2000))
work_dir = './work_dirs/s2a_bs8_fp16'
auto_resume = False
gpu_ids = range(0, 1)
Actually, I also only get 70.7% using multi-gpu (4gpus, bs=2, lr=0.01)
Actually, I also only get 70.7% using multi-gpu (4gpus, bs=2, lr=0.01)
Thanks for reply, in my own implemented s2anet, there is around 74 mAP with bs8. Maybe some codes goes wrong, I will keep debug on it.
In RotationDetection, multi-gpu often requires twice as much training to align with the performance of single-gpu.
But it seems that the official s2anet does not have such a problem, and the author suggests that just modify the lr.

Thanks for your feedback, looking forward to your PR!
Hi @liuyanyi We can share some experiments on ReDet, which may bring some inspiration.
| GPUs | sampers_per_gpu | lr | offline mAP | online mAP |
|---|---|---|---|---|
| 1 | 2 | 0.005 | 0.8925 | 76.68 |
| 8 | 1 | 0.02 | 0.777 | - |
| 8 | 2 | 0.04 | 0.886 | 75.97 |
Hi @liuyanyi We can share some experiments on ReDet, which may bring some inspiration.
GPUs sampers_per_gpu lr offline mAP online mAP 1 2 0.005 0.8925 76.68 8 1 0.02 0.777 - 8 2 0.04 0.886 75.97
Thanks for your experimental data, I'll try ReDet when i can access to a better gpu, it's too slow to train it even with fp16 on a tesla T4. I compare the s2anet code, the only difference is in alignconv, in csuhan/s2anet use only one alignconv in all strides but mmrotate use different alignconvs. But the online mAP ~70% and offline 81.79% is same with two implements. The 0.77 offline mAP is too strange. Maybe the learning rate and some parameter in optimizer affect the mAP. I'll try the adam or adamw to test on different batchsize, a dynamic optimizer may reduce the difference.
Hello. Seems you use fp16 to train bs8 s2anet. have you compared the results between fp32 and fp16 models?
Hello. Seems you use fp16 to train bs8 s2anet. have you compared the results between fp32 and fp16 models?
Hi, I didn't test on fp32 due to the trainging speed, and i think the fp16 won't affect the mAP too much. I test s2anet with adamw and fp16, There is still a 1% gap.
| lr | bs | gpu | offline mAP | online mAP |
|---|---|---|---|---|
| 0.000025 | 2 | 1 | 85.62% | 75.13% |
| 0.0001 | 8 | 1 | 85.13% | 74.34% |
For the same epoch, batch8 needs twice the number of iterations as batch16.
Can this gap be directly bridged by linear increase or decrease of lr?
batchsize = samples_per_gpu * gpus
@liuyanyi I also notice the performance gap in my re-implemented mmdet_v2 (s2anet is first implemented with mmdet_v1). To align with detectron2, mmdet_v2 changes some lr&optimizer params. One possible solution is to reduce the learning rate and increase the training time. Here I give a reference with mmrotate.
| model | version | lr | bs | schedule | mAP |
|---|---|---|---|---|---|
| s2anet | official | 0.02 | 16 | 1x | - |
| s2anet | mmrotate | 0.01 | 16 | 2x | 76.44 |
nice job!