PAIS icon indicating copy to clipboard operation
PAIS copied to clipboard

Performance

Open MihaiDavid05 opened this issue 2 years ago • 17 comments

Hello,

I used the provided config (PAIS.py) with a KNet backbone and the performance seems to be way lower than reported in the paper for 10% labeled data, yielding only 23 mAP (student, segmentation mAP). Can someone confirm this? Thank you @ccdatas, @hujiecpp :)

MihaiDavid05 avatar Oct 23 '23 07:10 MihaiDavid05

Hello, @MihaiDavid05

I am also trying to reimplement this work recently, but there are still lots of problem. I could help you confirm the performance if you would like to share some experience or solution.

  1. I installed mmdetection=2.23.0, mmcv=1.3.17 but there are still some enviroment problem. It shows that "ImportError: cannot import name 'Config' from 'mmcv' in tools/train.py. I searched that error in the internet but it comes out that we need to modify the code form from mmcv import Config, DictAction to from mmengine import Config, DictAction. It makes me feel weird about this modification. have you also tried this modification after you download the code too?
  2. I tried to follow the instruction in Readme.md but it shows that Failed to import ssod.knet_withiou_weight_lm.det.knet I tried to find knet_withiou_weight_lm under ssod but it doesn't exist. I wonder you also encounter same issue or you use other way to train this work?
  3. the author mentioned that they used 4 GPU to train this work. Do you also use 4 GPU or you modified some code?

Thanks for your attention on this work. I'm new to the mmlab framework so it comsume me lots of time to solve this problem. Hope you could provide some advice. Thank you!

HRliao1109 avatar Oct 24 '23 14:10 HRliao1109

Hello @HRliao1109 :)

After a couple of weeks I made it work. Here is a small markdown formatted list of instructions for installation, data prep., training, and evaluation, that worked for me (sorry because it is not well written in a nice Makefile or installation script). I wrote them down in case I need them later. I did not use their Makefile as I had troubles with mmdetection and mmcv, therefore I followed the documentation for installation from the specific versions of mmcv and mmdet. I did not modify anything in neither mmcv nor mmdet because with this installation (provided here) , I finally had no errors.

Indeed "ssod.knet_withiou_weight_lm" directory does not exist in the project. Again I did not use their scripts for training (or I changed the config files used for training), as the solution provided was broken, as you said.

For the MaskRCNN version I tried another config that you could find below the installation steps (It is the resulted full config, formatted from multiple other smaller configs, as in mmlab framework). After reading the paper and the implementation details, and checking their other configs based on MaskRCNN, I came up with this one. However, it still does not give the results from the paper. And it would be nice to now why :)

Lastly, I trained the network only on 1 GPU, but I only changed the bash command for that, nothing else in the configs or code.

Hope this helps and looking forward for your training results or from any insights from @ccdatas or @hujiecpp, regarding the performance :)

Instructions:

1. Requirements and installation

  • Anaconda3 with python=3.8
  • Pytorch=1.10.0 with CUDA=11.3 :
    • conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
  • Other libraries:
    • pip install opencv-python==4.7.0.72 prettytable scikit-image wandb==0.14.0 yapf==0.40.1
  • mmcv=1.3.17:
    • pip install openmim and then mim install mmcv-full==1.3.17
  • mmdetection=2.23.0:
    • create thirdparty dir under PROJECT_ROOT dir
    • then, cd thirdparty and download code from here and it should be in thirdparty/mmdetection directory
    • cd thirdparty/mmdetection and run pip install -r requirements/build.txt and python -m pip install -e .
  • export PYTHONPATH to PROJECT_ROOT directory, tools and ssod dirs. You can use:
    • export PYTHONPATH=<path_to_PROJECT_ROOT>:$PYTHONPATH (for running scripts from CMDline and VSCode debug)

2. Data Preparation

  • Download the COCO dataset from here. You should download both label and unlabeled data.

YOUR_DATA_PATH should be an external directory containing the coco dataset, which should look like this:

# YOUR_DATA_PATH/
#  coco/
#     train2017/
#     val2017/
#     unlabeled2017/
#     annotations/

Under PROJECT_ROOT/data dir, create symlinks to your dataset directories:

# PROJECT_ROOT/
#   data/
#     coco/           ---> symlink to YOUR_DATA/coco
#       train2017/
#       val2017/
#       unlabeled2017/
#       annotations/

For creating symlinks you can use:

cd ssod/data
ln -s <YOUR_DATA_PATH>/coco coco
  • Execute the following command to generate dataset splits:
bash tools/dataset/prepare_coco_data.sh -r ssod/data conduct

3. Training

CUDA_VISIBLE_DEVICES=<GPU_IDS> nohup python -m torch.distributed.launch --nproc_per_node=<NR_GPUS> --master_port=<PORT> <PATH_TO>/train.py <CONFIG_PATH> --launcher pytorch --cfg-options fold=<FOLD_OR_SEED> percent=<LABELED_DATA_PERCENT> >> <OUTPUT_LOG_PATH> &

Example:

CUDA_VISIBLE_DEVICES=0 nohup python -m torch.distributed.launch --nproc_per_node=1 --master_port=<PORT> <FULL_PATH_TO>/train.py <CONFIG_FULL_PATH> --launcher pytorch --cfg-options fold=1 percent=10 >> <OUTPUT_LOG_FULL_PATH> &

4. Evaluation

python -m torch.distributed.launch --nproc_per_node=<NR_GPUS> --master_port=<PORT> <FULL_PATH_TO>/test.py <FULL_CONFIG_FILE_FULL_PATH> <CHECKPOINT_FULL_PATH> --eval segm --show-dir <VIZ_OUT_DIR_FULL_PATH> --work-dir <OUT_DIR_FULL_PATH> --cfg-options model.test_cfg.rcnn.score_thr=<THR>

MaskRCNN based config:

model = dict(
    type='PiexlTeacher',
    model=dict(
        type='MaskRCNN',
        backbone=dict(
            type='ResNet',
            depth=50,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            norm_eval=True,
            style='pytorch',
            init_cfg=dict(
                type='Pretrained', checkpoint='torchvision://resnet50')),
        neck=dict(
            type='FPN',
            in_channels=[256, 512, 1024, 2048],
            out_channels=256,
            num_outs=5),
        rpn_head=dict(
            type='RPNHead',
            in_channels=256,
            feat_channels=256,
            anchor_generator=dict(
                type='AnchorGenerator',
                scales=[8],
                ratios=[0.5, 1.0, 2.0],
                strides=[4, 8, 16, 32, 64]),
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[1.0, 1.0, 1.0, 1.0]),
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        roi_head=dict(
            type='StandardRoIHead_iou',
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(
                    type='RoIAlign', output_size=7, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            bbox_head=dict(
                type='Shared2FCBBoxHead_iou',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
            mask_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(
                    type='RoIAlign', output_size=14, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            mask_head=dict(
                type='FCNMaskHead_iou',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=80,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
        train_cfg=dict(
            rpn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.3,
                    min_pos_iou=0.3,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=256,
                    pos_fraction=0.5,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=False),
                allowed_border=-1,
                pos_weight=-1,
                debug=False),
            rpn_proposal=dict(
                nms_pre=2000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)),
        test_cfg=dict(
            rpn=dict(
                nms_pre=1000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                score_thr=0.05,
                nms=dict(type='nms', iou_threshold=0.5),
                max_per_img=100,
                mask_thr_binary=0.5))),
    train_cfg=dict(
        use_teacher_proposal=False,
        pseudo_label_initial_score_thr=0.5,
        rpn_pseudo_threshold=0.5,
        cls_pseudo_threshold=0.5,
        reg_pseudo_threshold=0.02,
        jitter_times=10,
        jitter_scale=0.06,
        min_pseduo_box_size=0,
        unsup_weight=1.5),
    test_cfg=dict(inference_on='student'))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(
        type='Sequential',
        transforms=[
            dict(
                type='RandResize',
                img_scale=[(1333, 400), (1333, 1200)],
                multiscale_mode='range',
                keep_ratio=True),
            dict(type='RandFlip', flip_ratio=0.5),
            dict(
                type='OneOf',
                transforms=[
                    dict(type='Identity'),
                    dict(type='AutoContrast'),
                    dict(type='RandEqualize'),
                    dict(type='RandSolarize'),
                    dict(type='RandColor'),
                    dict(type='RandContrast'),
                    dict(type='RandBrightness'),
                    dict(type='RandSharpness'),
                    dict(type='RandPosterize')
                ])
        ],
        record=True),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='ExtraAttrs', tag='sup'),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
        meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
                   'pad_shape', 'scale_factor', 'tag'))
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='SemiDataset',
        sup=dict(
            type='CocoDataset',
            ann_file=
            'data/coco/annotations/semi_supervised/[email protected]',
            img_prefix='data/coco/train2017/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
                dict(
                    type='Sequential',
                    transforms=[
                        dict(
                            type='RandResize',
                            img_scale=[(1333, 400), (1333, 1200)],
                            multiscale_mode='range',
                            keep_ratio=True),
                        dict(type='RandFlip', flip_ratio=0.5),
                        dict(
                            type='OneOf',
                            transforms=[
                                dict(type='Identity'),
                                dict(type='AutoContrast'),
                                dict(type='RandEqualize'),
                                dict(type='RandSolarize'),
                                dict(type='RandColor'),
                                dict(type='RandContrast'),
                                dict(type='RandBrightness'),
                                dict(type='RandSharpness'),
                                dict(type='RandPosterize')
                            ])
                    ],
                    record=True),
                dict(type='Pad', size_divisor=32),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='ExtraAttrs', tag='sup'),
                dict(type='DefaultFormatBundle'),
                dict(
                    type='Collect',
                    keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
                    meta_keys=('filename', 'ori_shape', 'img_shape',
                               'img_norm_cfg', 'pad_shape', 'scale_factor',
                               'tag'))
            ]),
        unsup=dict(
            type='CocoDataset',
            ann_file=
            'data/coco/annotations/semi_supervised/[email protected]',
            img_prefix='data/coco/train2017/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='PseudoSamples', with_bbox=True, with_mask=True),
                dict(
                    type='MultiBranch',
                    unsup_student=[
                        dict(
                            type='Sequential',
                            transforms=[
                                dict(
                                    type='RandResize',
                                    img_scale=[(1333, 400), (1333, 1200)],
                                    multiscale_mode='range',
                                    keep_ratio=True),
                                dict(type='RandFlip', flip_ratio=0.5),
                                dict(
                                    type='ShuffledSequential',
                                    transforms=[
                                        dict(
                                            type='OneOf',
                                            transforms=[
                                                dict(type='Identity'),
                                                dict(type='AutoContrast'),
                                                dict(type='RandEqualize'),
                                                dict(type='RandSolarize'),
                                                dict(type='RandColor'),
                                                dict(type='RandContrast'),
                                                dict(type='RandBrightness'),
                                                dict(type='RandSharpness'),
                                                dict(type='RandPosterize')
                                            ]),
                                        dict(
                                            type='OneOf',
                                            transforms=[{
                                                'type': 'RandTranslate',
                                                'x': (-0.1, 0.1)
                                            }, {
                                                'type': 'RandTranslate',
                                                'y': (-0.1, 0.1)
                                            }, {
                                                'type': 'RandRotate',
                                                'angle': (-30, 30)
                                            },
                                                        [{
                                                            'type':
                                                            'RandShear',
                                                            'x': (-30, 30)
                                                        }, {
                                                            'type':
                                                            'RandShear',
                                                            'y': (-30, 30)
                                                        }]])
                                    ]),
                                dict(
                                    type='RandErase',
                                    n_iterations=(1, 5),
                                    size=[0, 0.2],
                                    squared=True)
                            ],
                            record=True),
                        dict(type='Pad', size_divisor=32),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='ExtraAttrs', tag='unsup_student'),
                        dict(type='DefaultFormatBundle'),
                        dict(
                            type='Collect',
                            keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
                            meta_keys=('filename', 'ori_shape', 'img_shape',
                                       'img_norm_cfg', 'pad_shape',
                                       'scale_factor', 'tag',
                                       'transform_matrix'))
                    ],
                    unsup_teacher=[
                        dict(
                            type='Sequential',
                            transforms=[
                                dict(
                                    type='RandResize',
                                    img_scale=[(1333, 400), (1333, 1200)],
                                    multiscale_mode='range',
                                    keep_ratio=True),
                                dict(type='RandFlip', flip_ratio=0.5)
                            ],
                            record=True),
                        dict(type='Pad', size_divisor=32),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='ExtraAttrs', tag='unsup_teacher'),
                        dict(type='DefaultFormatBundle'),
                        dict(
                            type='Collect',
                            keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
                            meta_keys=('filename', 'ori_shape', 'img_shape',
                                       'img_norm_cfg', 'pad_shape',
                                       'scale_factor', 'tag',
                                       'transform_matrix'))
                    ])
            ],
            filter_empty_gt=False)),
    val=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    sampler=dict(
        train=dict(
            type='SemiBalanceSampler',
            sample_ratio=[1, 3],
            by_prob=True,
            epoch_length=7330)))
evaluation = dict(
    metric=['bbox', 'segm'], type='SubModulesDistEvalHook', interval=7500)
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[120000, 160000],
    by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=220000)
checkpoint_config = dict(interval=3750, by_epoch=False, max_keep_ckpts=20)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(
            type='WandbLoggerHook',
            init_kwargs=dict(
                project='pre_release',
                name='ssl_mask_180k',
                config=dict(
                    fold=1,
                    percent=10,
                    work_dirs='work_dirs/ssl_mask_rcnn_0.5_0.5_0.6_1.8',
                    total_step=220000)),
            by_epoch=False)
    ])
custom_hooks = [
    dict(type='NumClassCheckHook'),
    dict(type='WeightSummary'),
    dict(type='MeanTeacher', momentum=0.999, interval=1, warm_up=0)
]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
mmdet_base = '../../thirdparty/mmdetection/configs/_base_'
custom_imports = dict(
    imports=[
        'ssod.mask_rcnn_iou.cls_bbox_iou', 'ssod.mask_rcnn_iou.bbox_head',
        'ssod.mask_rcnn_iou.standard_roi_head_iou',
        'ssod.mask_rcnn_iou.fcn_mask_head_iou'
    ],
    allow_failed_imports=False)
strong_pipeline = [
    dict(
        type='Sequential',
        transforms=[
            dict(
                type='RandResize',
                img_scale=[(1333, 400), (1333, 1200)],
                multiscale_mode='range',
                keep_ratio=True),
            dict(type='RandFlip', flip_ratio=0.5),
            dict(
                type='ShuffledSequential',
                transforms=[
                    dict(
                        type='OneOf',
                        transforms=[
                            dict(type='Identity'),
                            dict(type='AutoContrast'),
                            dict(type='RandEqualize'),
                            dict(type='RandSolarize'),
                            dict(type='RandColor'),
                            dict(type='RandContrast'),
                            dict(type='RandBrightness'),
                            dict(type='RandSharpness'),
                            dict(type='RandPosterize')
                        ]),
                    dict(
                        type='OneOf',
                        transforms=[{
                            'type': 'RandTranslate',
                            'x': (-0.1, 0.1)
                        }, {
                            'type': 'RandTranslate',
                            'y': (-0.1, 0.1)
                        }, {
                            'type': 'RandRotate',
                            'angle': (-30, 30)
                        },
                                    [{
                                        'type': 'RandShear',
                                        'x': (-30, 30)
                                    }, {
                                        'type': 'RandShear',
                                        'y': (-30, 30)
                                    }]])
                ]),
            dict(
                type='RandErase',
                n_iterations=(1, 5),
                size=[0, 0.2],
                squared=True)
        ],
        record=True),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='ExtraAttrs', tag='unsup_student'),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
        meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
                   'pad_shape', 'scale_factor', 'tag', 'transform_matrix'))
]
weak_pipeline = [
    dict(
        type='Sequential',
        transforms=[
            dict(
                type='RandResize',
                img_scale=[(1333, 400), (1333, 1200)],
                multiscale_mode='range',
                keep_ratio=True),
            dict(type='RandFlip', flip_ratio=0.5)
        ],
        record=True),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='ExtraAttrs', tag='unsup_teacher'),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
        meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
                   'pad_shape', 'scale_factor', 'tag', 'transform_matrix'))
]
unsup_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='PseudoSamples', with_bbox=True, with_mask=True),
    dict(
        type='MultiBranch',
        unsup_student=[
            dict(
                type='Sequential',
                transforms=[
                    dict(
                        type='RandResize',
                        img_scale=[(1333, 400), (1333, 1200)],
                        multiscale_mode='range',
                        keep_ratio=True),
                    dict(type='RandFlip', flip_ratio=0.5),
                    dict(
                        type='ShuffledSequential',
                        transforms=[
                            dict(
                                type='OneOf',
                                transforms=[
                                    dict(type='Identity'),
                                    dict(type='AutoContrast'),
                                    dict(type='RandEqualize'),
                                    dict(type='RandSolarize'),
                                    dict(type='RandColor'),
                                    dict(type='RandContrast'),
                                    dict(type='RandBrightness'),
                                    dict(type='RandSharpness'),
                                    dict(type='RandPosterize')
                                ]),
                            dict(
                                type='OneOf',
                                transforms=[{
                                    'type': 'RandTranslate',
                                    'x': (-0.1, 0.1)
                                }, {
                                    'type': 'RandTranslate',
                                    'y': (-0.1, 0.1)
                                }, {
                                    'type': 'RandRotate',
                                    'angle': (-30, 30)
                                },
                                            [{
                                                'type': 'RandShear',
                                                'x': (-30, 30)
                                            }, {
                                                'type': 'RandShear',
                                                'y': (-30, 30)
                                            }]])
                        ]),
                    dict(
                        type='RandErase',
                        n_iterations=(1, 5),
                        size=[0, 0.2],
                        squared=True)
                ],
                record=True),
            dict(type='Pad', size_divisor=32),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ExtraAttrs', tag='unsup_student'),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
                meta_keys=('filename', 'ori_shape', 'img_shape',
                           'img_norm_cfg', 'pad_shape', 'scale_factor', 'tag',
                           'transform_matrix'))
        ],
        unsup_teacher=[
            dict(
                type='Sequential',
                transforms=[
                    dict(
                        type='RandResize',
                        img_scale=[(1333, 400), (1333, 1200)],
                        multiscale_mode='range',
                        keep_ratio=True),
                    dict(type='RandFlip', flip_ratio=0.5)
                ],
                record=True),
            dict(type='Pad', size_divisor=32),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ExtraAttrs', tag='unsup_teacher'),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'],
                meta_keys=('filename', 'ori_shape', 'img_shape',
                           'img_norm_cfg', 'pad_shape', 'scale_factor', 'tag',
                           'transform_matrix'))
        ])
]
fp16 = dict(loss_scale='dynamic')
fold = 1
percent = 10
work_dir = 'work_dirs/ssl_mask_rcnn_0.5_0.5_0.6_1.8'
cfg_name = 'ssl_mask_180k'
gpu_ids = range(0, 1)

MihaiDavid05 avatar Oct 24 '23 15:10 MihaiDavid05

Hello, @MihaiDavid05 How many GPU did you use for training? In the paper, our use 4 GPU for training.

ccdatas avatar Oct 24 '23 16:10 ccdatas

Hi, @ccdatas.

Thank for getting back. I used only 1 GPU with 4 samples per GPU, and a 1:3 ratio. Do you think the higher batch size could be the sole reason to a better performance?

Also, I noticed that the alpha and beta parameters are not in the config, but I guess that they are set to 4, at least for KNet, as I was looking through the code in the ssod/knet_withioudet/kernel_update_head.py script lines 537 and 566 for class and mask. However I see that there are 2 more multipliers, 12 and 2.5, for class and mask, respectively, which I cannot figure out what they are. Can you give me an insight?

Also, for the MaskRCNN version, could you please point out where is the alpha parameter basically integrated? It seems it is 1 and the highest classification score is considered 1 (mask_rcnn_iou/bbox_head.py, line 186). As for the beta parameter, I found it to be 4 in fcn_mask_head_iou.py, loss_unlabel() function (and again there is a multiplier=5 in the weight formula, that I have no clue about).

Lastly, I did not figure out where do you threshold based on mask IoU in the MaskRCNN version. Could you please point that out? Or due to NMS there is no mask IoU threshold for the MaskRCNN version?

Thank you very much for the great work and for your help!

MihaiDavid05 avatar Oct 24 '23 16:10 MihaiDavid05

Hi @MihaiDavid05,

Thank you for your kind and detailed reply! I will try it and share my training result as soon as possible.

HRliao1109 avatar Oct 25 '23 07:10 HRliao1109

Hi @MihaiDavid05

Thank you for your kind help I have also successfully train this work on my GPU. I also trained it with only 1 GPU, 10% labeled on coco dataset2017 The following is my evaluation result.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.207 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.348 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.212 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.080 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.217 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.325 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.389 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.389 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.389 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.166 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.414 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.592

hope my result is helpful to you.

HRliao1109 avatar Oct 31 '23 15:10 HRliao1109

Hey @HRliao1109 , Thank for the update! I am currently training with a batch size of 16 (4:12 ratio labeled/unlabeled), on another bigger GPU, and it seems to reach the results from the paper indeed (however there are 6 days left).

MihaiDavid05 avatar Oct 31 '23 15:10 MihaiDavid05

Hi @MihaiDavid05 I forgot to mentioned that I set the samples_per_gpu=2, workers_per_gpu=4, because of limitation of my GPU. Original setting is here

Besides, I notice that original sample ratio setting is [1, 2], it seems there are 1 labeled picture and 2 unlabeled picture in a batch. It's a little different from paper. Did you also modified this part in your new training code? Please also inform me if I have any misunderstanding about this parameter.

HRliao1109 avatar Nov 01 '23 09:11 HRliao1109

Hi @HRliao1109 , Yes, I modified that parameter too. My results from the initial post (23 mAP) were reached with a batch of 4 (1:3 ratio set in the sampler). Therefore, your results being a bit lower, with a lower batch size, are expected.

MihaiDavid05 avatar Nov 01 '23 09:11 MihaiDavid05

Hey @HRliao1109 , Thank for the update! I am currently training with a batch size of 16 (4:12 ratio labeled/unlabeled), on another bigger GPU, and it seems to reach the results from the paper indeed (however there are 6 days left).

Hi @MihaiDavid05 , I wonder if there is any update since your reply last week? I'm also trying to train this work on bigger GPU. Unfortunately, I'm still dealing with environment problem with ppc64le(IBM PowerPC) so there is no further training result. Thanks in advance!

HRliao1109 avatar Nov 09 '23 07:11 HRliao1109

Hey @HRliao1109, Sorry for the late reply. I finally finished training the network on 1 sample of randomly chosen 10% labeled data, with a batch size of 16, therefore a 4:12 ratio, on a single GPU (40G). The results I got were 30.1 mAP ( 31.04 +- 0.06 on 3 runs, in the paper). I believe that a part of the difference in results comes from the chosen sampled data, but there is still a gap left :)

MihaiDavid05 avatar Dec 18 '23 08:12 MihaiDavid05

Hi @MihaiDavid05 ,

I have also been trying to reproduce the results of this paper recently. I am using two 3090 with a batch size of 8 (4 images per GPU, at a 1:3 ratio), testing on 10% labeled dataset, which is essentially the same setup as yours. Due to the limitations of the resources I have, I am unable to use a batch size of 16 directly. I set the batch size to 8 and used gradient accumulation to simulate a batch size of 16. I will get back to you with the training results after it finish.

Thank you for your reply!

HRliao1109 avatar Dec 27 '23 08:12 HRliao1109

Hi @MihaiDavid05

I have also recently completed the reproduce of this paper. As I mentioned before, I used gradient accumulation, so there might be some differences. Below are my replication results.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.307 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.507 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.320 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.114 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.327 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.478 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.453 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.453 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.453 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.216 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.484 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.666

HRliao1109 avatar Jan 16 '24 09:01 HRliao1109

@HRliao1109 thank you for the update. Did you accumulate the gradients every 2 steps? Or what was your accumulation rule, as you set your batch to half of the original? What did you modify for gradient accumulation in the config files? Did you keep the same learning rate or you also scales that?

Thank you :)

MihaiDavid05 avatar Jan 16 '24 09:01 MihaiDavid05

@MihaiDavid05 I update the gradient every 2 step in order to simulate the original setting in paper.

HRliao1109 avatar Jan 16 '24 09:01 HRliao1109

hi @MihaiDavid05 I found that I missed some question in last reply and I'm sorry for it. I modified the code in schedule_1x.py line8

from optimizer_config = dict(grad_clip=dict(max_norm=1, norm_type=2)) to optimizer_config = dict(grad_clip=dict(max_norm=1, norm_type=2), type="GradientCumulativeOptimizerHook", cumulative_iters=2)

I didn't modify the learning rate in same time but it should be adjusted when I change the optimizer_config. hope it's helpful to you👍

HRliao1109 avatar Jan 26 '24 03:01 HRliao1109

Hey @HRliao1109, thank you for your response!

MihaiDavid05 avatar Jan 26 '24 10:01 MihaiDavid05