mmrotate mAP of s2anet under different batchsizes

I train the s2anet (fp16) in batchsize 2 and batchsize 8, and got a 3.7% difference on mAP. It's a little weird.

batchsize	lr	mAP
8	0.01	70.03
2	0.025	73.74

full config for bs8:

dataset_type = 'DOTADataset'
data_root = '/datasets/Dota_mmrotate/dota/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version='le135'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='DOTADataset',
        ann_file='/datasets/Dota_mmrotate/dota/trainval/annfiles/',
        img_prefix='/datasets/Dota_mmrotate/dota/trainval/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(
                type='RRandomFlip',
                flip_ratio=[0.25, 0.25, 0.25],
                direction=['horizontal', 'vertical', 'diagonal'],
                version='le135'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        version='le135'),
    val=dict(
        type='DOTADataset',
        ann_file='/datasets/Dota_mmrotate/dota/trainval/annfiles/',
        img_prefix='/datasets/Dota_mmrotate/dota/trainval/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='le135'),
    test=dict(
        type='DOTADataset',
        ann_file='/datasets/Dota_mmrotate/dota/test/images/',
        img_prefix='/datasets/Dota_mmrotate/dota/test/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='le135'))
evaluation = dict(interval=12, metric='mAP', nproc=1)
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=4)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = dict(loss_scale=dict(init_scale=512))
angle_version = 'le135'
model = dict(
    type='S2ANet',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        zero_init_residual=False,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_input',
        num_outs=5),
    fam_head=dict(
        type='RotatedRetinaHead',
        num_classes=15,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        assign_by_circumhbbox=None,
        anchor_generator=dict(
            type='RotatedAnchorGenerator',
            scales=[4],
            ratios=[1.0],
            strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHAOBBoxCoder',
            angle_range='le135',
            norm_factor=1,
            edge_swap=False,
            proj_xy=True,
            target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
            target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)),
    align_cfgs=dict(
        type='AlignConv',
        kernel_size=3,
        channels=256,
        featmap_strides=[8, 16, 32, 64, 128]),
    odm_head=dict(
        type='ODMRefineHead',
        num_classes=15,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        assign_by_circumhbbox=None,
        anchor_generator=dict(
            type='PseudoAnchorGenerator', strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHAOBBoxCoder',
            angle_range='le135',
            norm_factor=1,
            edge_swap=False,
            proj_xy=True,
            target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
            target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)),
    train_cfg=dict(
        fam_cfg=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.4,
                min_pos_iou=0,
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBboxOverlaps2D')),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        odm_cfg=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.4,
                min_pos_iou=0,
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBboxOverlaps2D')),
            allowed_border=-1,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        nms_pre=2000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(iou_thr=0.1),
        max_per_img=2000))
work_dir = './work_dirs/s2a_bs8_fp16'
auto_resume = False
gpu_ids = range(0, 1)

Mar 03 '22 01:03 liuyanyi

Actually, I also only get 70.7% using multi-gpu (4gpus, bs=2, lr=0.01)

Mar 03 '22 02:03 yangxue0827

Actually, I also only get 70.7% using multi-gpu (4gpus, bs=2, lr=0.01)

Thanks for reply, in my own implemented s2anet, there is around 74 mAP with bs8. Maybe some codes goes wrong, I will keep debug on it.

Mar 03 '22 02:03 liuyanyi

In RotationDetection, multi-gpu often requires twice as much training to align with the performance of single-gpu. But it seems that the official s2anet does not have such a problem, and the author suggests that just modify the lr.

Mar 03 '22 02:03 yangxue0827

Thanks for your feedback, looking forward to your PR!

Mar 03 '22 02:03 yangxue0827

Hi @liuyanyi We can share some experiments on ReDet, which may bring some inspiration.

GPUs	sampers_per_gpu	lr	offline mAP	online mAP
1	2	0.005	0.8925	76.68
8	1	0.02	0.777	-
8	2	0.04	0.886	75.97

Mar 06 '22 07:03 zytx121

Hi @liuyanyi We can share some experiments on ReDet, which may bring some inspiration.

GPUs sampers_per_gpu lr offline mAP online mAP 1 2 0.005 0.8925 76.68 8 1 0.02 0.777 - 8 2 0.04 0.886 75.97

Thanks for your experimental data, I'll try ReDet when i can access to a better gpu, it's too slow to train it even with fp16 on a tesla T4. I compare the s2anet code, the only difference is in alignconv, in csuhan/s2anet use only one alignconv in all strides but mmrotate use different alignconvs. But the online mAP ~70% and offline 81.79% is same with two implements. The 0.77 offline mAP is too strange. Maybe the learning rate and some parameter in optimizer affect the mAP. I'll try the adam or adamw to test on different batchsize, a dynamic optimizer may reduce the difference.

Mar 06 '22 09:03 liuyanyi

Hello. Seems you use fp16 to train bs8 s2anet. have you compared the results between fp32 and fp16 models?

Mar 07 '22 06:03 jbwang1997

Hello. Seems you use fp16 to train bs8 s2anet. have you compared the results between fp32 and fp16 models?

Hi, I didn't test on fp32 due to the trainging speed, and i think the fp16 won't affect the mAP too much. I test s2anet with adamw and fp16, There is still a 1% gap.

lr	bs	gpu	offline mAP	online mAP
0.000025	2	1	85.62%	75.13%
0.0001	8	1	85.13%	74.34%

Mar 07 '22 15:03 liuyanyi

For the same epoch, batch8 needs twice the number of iterations as batch16. Can this gap be directly bridged by linear increase or decrease of lr? batchsize = samples_per_gpu * gpus

Mar 08 '22 02:03 heiyuxiaokai

@liuyanyi I also notice the performance gap in my re-implemented mmdet_v2 (s2anet is first implemented with mmdet_v1). To align with detectron2, mmdet_v2 changes some lr&optimizer params. One possible solution is to reduce the learning rate and increase the training time. Here I give a reference with mmrotate.

model	version	lr	bs	schedule	mAP
s2anet	official	0.02	16	1x	-
s2anet	mmrotate	0.01	16	2x	76.44

Mar 13 '22 04:03 csuhan

nice job!

Mar 13 '22 04:03 yangxue0827

mmrotate mmrotate copied to clipboard

mAP of s2anet under different batchsizes

mmrotate
mmrotate copied to clipboard