mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

Manual Forward and Backward Passes in Faster RCNN

Open barisbatuhan opened this issue 2 years ago • 2 comments

Hi,

Due to my project requirements, I cannot use the runners provided by mmdet and mmcv. Therefore, I am trying to implement my own runner. My problem is that although the model successfully takes the inputs and provides losses, it does not train, and the AP performance does not increase. Simplified forward and backward passes of my implementation are given below:

model = build_detector(cfg.model)
model = model.to("cuda")
optimizer = build_optimizer(model, cfg.optimizer)
model.train()

# assume that there is a method named prepare_data that gives the necessary 
# parameters for the detector to process. I am sure that the values returned from
# this method is completely correct.
# Boxes are in the format of [left_x, top_y, right_x, bottom_y]
# Labels start from 0 and go until num_class-1
imgs, img_metas, boxes, labels = prepare_data(batch) 

loss_dict = model(imgs, img_metas, gt_bboxes=boxes, gt_labels=labels)

loss = 0
loss += loss_dict['loss_cls'] + sum(loss_dict['loss_rpn_cls'])
loss += loss_dict['loss_bbox'] + sum(loss_dict['loss_rpn_bbox'])

loss.backward()
optimizer.step() 

When I set lr = 1e-6, I get the loss values below for the WIDER FACE dataset:

I: 10 / 250000  | Loss: 1.9564 --> cls: 1.2266   | cls_rpn: 0.6943 | box: 0.0035 | box_rpn: 0.0321 | acc: 1.359
I: 20 / 250000  | Loss: 1.9422 --> cls: 1.1599   | cls_rpn: 0.6908 | box: 0.0131 | box_rpn: 0.0784 | acc: 1.8799
I: 30 / 250000  | Loss: 1.7722 --> cls: 1.0583   | cls_rpn: 0.6793 | box: 0.0011 | box_rpn: 0.0335 | acc: 77.3356
I: 40 / 250000  | Loss: 1.7082 --> cls: 0.9087   | cls_rpn: 0.6758 | box: 0.0065 | box_rpn: 0.1171 | acc: 96.8343
I: 50 / 250000  | Loss: 1.5087 --> cls: 0.7747   | cls_rpn: 0.6621 | box: 0.003  | box_rpn: 0.069  | acc: 97.1598
I: 60 / 250000  | Loss: 1.325  --> cls: 0.6241   | cls_rpn: 0.6376 | box: 0.0072 | box_rpn: 0.0561 | acc: 95.3044
I: 70 / 250000  | Loss: 1.1553 --> cls: 0.4926   | cls_rpn: 0.6039 | box: 0.0065 | box_rpn: 0.0523 | acc: 98.1201
I: 80 / 250000  | Loss: 1.0318 --> cls: 0.4184   | cls_rpn: 0.5766 | box: 0.0063 | box_rpn: 0.0305 | acc: 98.9583
I: 90 / 250000  | Loss: 1.0109 --> cls: 0.3854   | cls_rpn: 0.5629 | box: 0.0115 | box_rpn: 0.0511 | acc: 97.2087
I: 100 / 250000 | Loss: 1.0006 --> cls: 0.388    | cls_rpn: 0.5419 | box: 0.0108 | box_rpn: 0.0599 | acc: 98.2747
I: 110 / 250000 | Loss: 1.0455 --> cls: 0.4863   | cls_rpn: 0.4962 | box: 0.0162 | box_rpn: 0.0468 | acc: 98.7793
I: 120 / 250000 | Loss: 1.3008 --> cls: 0.6999   | cls_rpn: 0.4894 | box: 0.0266 | box_rpn: 0.085  | acc: 97.111
...

However, my AP scores do not increase further than 0.0001. Although I train the model with only one class, the accuracy increases during training, and the loss value decrease. I use the default inference method of mmdet to get the predictions.

My optimizer parameters are as follows (I tried several lr rates from 1e-3 to 1e-7. Greater ones cause an unnormal increase in the loss):

optimizer = dict(type="SGD", lr=0.000025, momentum=0.9, weight_decay=0.0001)

My model parameters are given below:

# model settings
model = dict(
    type='FasterRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=1,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)
    ))

barisbatuhan avatar Aug 05 '22 15:08 barisbatuhan

Seems your config file is normal. Did your runner have a for loop? did not see it in your issue

BIGWangYuDong avatar Aug 08 '22 00:08 BIGWangYuDong

For WIDER FACE dataset, you can refer to https://github.com/open-mmlab/mmdetection/pull/8508

BIGWangYuDong avatar Aug 08 '22 00:08 BIGWangYuDong

Yes, I train iteration-wise, my for loop is as follows:

for i in range(0, max_num_iters, val_per_iter):
    for i2 in range(0, val_per_iter):
        # do the processes given above
    # run evaluation on the model

barisbatuhan avatar Aug 08 '22 08:08 barisbatuhan

A sample data format retrieved from the prepare_data method is also shared below for a small batch size of 2:

Shape of Images Array: torch.Size([2, 3, 800, 800])

Img Metas: [
{
 'filename': 'widerface/WIDER_train/images/26--Soldier_Drilling/26_Soldier_Drilling_Soldiers_Drilling_26_412.jpg', 
 'ori_filename': 'WIDER_train/images/26--Soldier_Drilling/26_Soldier_Drilling_Soldiers_Drilling_26_412.jpg', 
 'ori_shape': tensor([682, 1024, 3]), 
 'img_shape': tensor([800, 800, 3]), 
 'pad_shape': tensor([800, 800, 3]), 
 'scale_factor': tensor([1.1747, 1.1747, 1.1747, 1.1747], device='cuda:0'), 
 'img_norm_cfg': {
    'mean': tensor([123.6750, 116.2800, 103.5300]),
    'std': tensor([58.3950, 57.1200, 57.3750]), 
    'to_rgb': tensor(True),
 }, 
 'flip': tensor(False), 
 'flip_direction': None
}, 
{
 'filename': 'widerface/WIDER_train/images/55--Sports_Coach_Trainer/55_Sports_Coach_Trainer_sportcoaching_55_501.jpg', 
 'ori_filename': 'WIDER_train/images/55--Sports_Coach_Trainer/55_Sports_Coach_Trainer_sportcoaching_55_501.jpg', 
 'ori_shape': tensor([684, 1024, 3]), 
 'img_shape': tensor([800, 800, 3]), 
 'pad_shape': tensor([800, 800, 3]), 
 'scale_factor': tensor([1.1713, 1.1713, 1.1713, 1.1713], device='cuda:0'), 
 'img_norm_cfg': {
    'mean': tensor([123.6750, 116.2800, 103.5300]),
    'std': tensor([58.3950, 57.1200, 57.3750]), 
    'to_rgb': tensor(True),
 },
 'flip': tensor(False), 
 'flip_direction': None
}]

Boxes: [
    tensor([[515.7122, 390.0147, 595.5947, 516.8870], [609.6917, 300.7342, 698.9721, 444.0529]], device='cuda:0'),
    tensor([[189.7511,  86.6764, 318.5944, 255.3441]], device='cuda:0')
]
Labels: [
    tensor([0, 0], device='cuda:0'), 
    tensor([0], device='cuda:0')
]

Since there is only one class (face), the labels available are only given as "0". The filename and ori_filename parameters are correct. To get the correct predictions, I use a modified version of aug_test method of the model:

def aug_test(self, imgs, img_metas, rescale=False, **kwargs):
        """Test with augmentations.

        If rescale is False, then returned bboxes and masks will fit the scale
        of imgs[0].
        """
        if type(imgs) in [torch.Tensor, np.ndarray]:
            imgs = [imgs[i:i+1,...] for i in range(imgs.shape[0])]
        xs = self.extract_feats(imgs)
        outs = []
        for x, img_meta in zip(xs, img_metas):
            # processes images one by one to get the predictions
            if type(img_meta) == dict:
                img_meta_list = [img_meta]
            else:
                img_meta_list = img_meta
            
            proposal_list = self.rpn_head.aug_test_rpn([x], [img_meta_list])
            out = self.roi_head.aug_test([x], proposal_list, [img_meta_list], rescale=rescale)
            outs.append(out[0][0]) # gets predictions for image in 0th index and 0th class

        return outs

barisbatuhan avatar Aug 08 '22 09:08 barisbatuhan

I should also add that the problem is not specific to WIDER FACE. If I try to train my model for person detection by using COCO dataset and person labels, I still get the same problem.

barisbatuhan avatar Aug 08 '22 09:08 barisbatuhan

So, this seems that your training or testing loop have some problem

BIGWangYuDong avatar Aug 09 '22 02:08 BIGWangYuDong