mmrotate icon indicating copy to clipboard operation
mmrotate copied to clipboard

mAP of s2anet under different batchsizes

Open liuyanyi opened this issue 3 years ago • 11 comments

I train the s2anet (fp16) in batchsize 2 and batchsize 8, and got a 3.7% difference on mAP. It's a little weird.

batchsize lr mAP
8 0.01 70.03
2 0.025 73.74

full config for bs8:

dataset_type = 'DOTADataset'
data_root = '/datasets/Dota_mmrotate/dota/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version='le135'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='DOTADataset',
        ann_file='/datasets/Dota_mmrotate/dota/trainval/annfiles/',
        img_prefix='/datasets/Dota_mmrotate/dota/trainval/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(
                type='RRandomFlip',
                flip_ratio=[0.25, 0.25, 0.25],
                direction=['horizontal', 'vertical', 'diagonal'],
                version='le135'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        version='le135'),
    val=dict(
        type='DOTADataset',
        ann_file='/datasets/Dota_mmrotate/dota/trainval/annfiles/',
        img_prefix='/datasets/Dota_mmrotate/dota/trainval/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='le135'),
    test=dict(
        type='DOTADataset',
        ann_file='/datasets/Dota_mmrotate/dota/test/images/',
        img_prefix='/datasets/Dota_mmrotate/dota/test/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='le135'))
evaluation = dict(interval=12, metric='mAP', nproc=1)
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=4)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = dict(loss_scale=dict(init_scale=512))
angle_version = 'le135'
model = dict(
    type='S2ANet',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        zero_init_residual=False,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_input',
        num_outs=5),
    fam_head=dict(
        type='RotatedRetinaHead',
        num_classes=15,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        assign_by_circumhbbox=None,
        anchor_generator=dict(
            type='RotatedAnchorGenerator',
            scales=[4],
            ratios=[1.0],
            strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHAOBBoxCoder',
            angle_range='le135',
            norm_factor=1,
            edge_swap=False,
            proj_xy=True,
            target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
            target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)),
    align_cfgs=dict(
        type='AlignConv',
        kernel_size=3,
        channels=256,
        featmap_strides=[8, 16, 32, 64, 128]),
    odm_head=dict(
        type='ODMRefineHead',
        num_classes=15,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        assign_by_circumhbbox=None,
        anchor_generator=dict(
            type='PseudoAnchorGenerator', strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHAOBBoxCoder',
            angle_range='le135',
            norm_factor=1,
            edge_swap=False,
            proj_xy=True,
            target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
            target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)),
    train_cfg=dict(
        fam_cfg=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.4,
                min_pos_iou=0,
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBboxOverlaps2D')),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        odm_cfg=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.4,
                min_pos_iou=0,
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBboxOverlaps2D')),
            allowed_border=-1,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        nms_pre=2000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(iou_thr=0.1),
        max_per_img=2000))
work_dir = './work_dirs/s2a_bs8_fp16'
auto_resume = False
gpu_ids = range(0, 1)

liuyanyi avatar Mar 03 '22 01:03 liuyanyi

Actually, I also only get 70.7% using multi-gpu (4gpus, bs=2, lr=0.01)

yangxue0827 avatar Mar 03 '22 02:03 yangxue0827

Actually, I also only get 70.7% using multi-gpu (4gpus, bs=2, lr=0.01)

Thanks for reply, in my own implemented s2anet, there is around 74 mAP with bs8. Maybe some codes goes wrong, I will keep debug on it.

liuyanyi avatar Mar 03 '22 02:03 liuyanyi

In RotationDetection, multi-gpu often requires twice as much training to align with the performance of single-gpu. But it seems that the official s2anet does not have such a problem, and the author suggests that just modify the lr. image

yangxue0827 avatar Mar 03 '22 02:03 yangxue0827

Thanks for your feedback, looking forward to your PR!

yangxue0827 avatar Mar 03 '22 02:03 yangxue0827

Hi @liuyanyi We can share some experiments on ReDet, which may bring some inspiration.

GPUs sampers_per_gpu lr offline mAP online mAP
1 2 0.005 0.8925 76.68
8 1 0.02 0.777 -
8 2 0.04 0.886 75.97

zytx121 avatar Mar 06 '22 07:03 zytx121

Hi @liuyanyi We can share some experiments on ReDet, which may bring some inspiration.

GPUs sampers_per_gpu lr offline mAP online mAP 1 2 0.005 0.8925 76.68 8 1 0.02 0.777 - 8 2 0.04 0.886 75.97

Thanks for your experimental data, I'll try ReDet when i can access to a better gpu, it's too slow to train it even with fp16 on a tesla T4. I compare the s2anet code, the only difference is in alignconv, in csuhan/s2anet use only one alignconv in all strides but mmrotate use different alignconvs. But the online mAP ~70% and offline 81.79% is same with two implements. The 0.77 offline mAP is too strange. Maybe the learning rate and some parameter in optimizer affect the mAP. I'll try the adam or adamw to test on different batchsize, a dynamic optimizer may reduce the difference.

liuyanyi avatar Mar 06 '22 09:03 liuyanyi

Hello. Seems you use fp16 to train bs8 s2anet. have you compared the results between fp32 and fp16 models?

jbwang1997 avatar Mar 07 '22 06:03 jbwang1997

Hello. Seems you use fp16 to train bs8 s2anet. have you compared the results between fp32 and fp16 models?

Hi, I didn't test on fp32 due to the trainging speed, and i think the fp16 won't affect the mAP too much. I test s2anet with adamw and fp16, There is still a 1% gap.

lr bs gpu offline mAP online mAP
0.000025 2 1 85.62% 75.13%
0.0001 8 1 85.13% 74.34%

liuyanyi avatar Mar 07 '22 15:03 liuyanyi

For the same epoch, batch8 needs twice the number of iterations as batch16. Can this gap be directly bridged by linear increase or decrease of lr? batchsize = samples_per_gpu * gpus

heiyuxiaokai avatar Mar 08 '22 02:03 heiyuxiaokai

@liuyanyi I also notice the performance gap in my re-implemented mmdet_v2 (s2anet is first implemented with mmdet_v1). To align with detectron2, mmdet_v2 changes some lr&optimizer params. One possible solution is to reduce the learning rate and increase the training time. Here I give a reference with mmrotate.

model version lr bs schedule mAP
s2anet official 0.02 16 1x -
s2anet mmrotate 0.01 16 2x 76.44

csuhan avatar Mar 13 '22 04:03 csuhan

nice job!

yangxue0827 avatar Mar 13 '22 04:03 yangxue0827