faster-rcnn-pytorch Inference from jywang's checkpoint

I am trying to run detection from a model that I trained with jwyang's repository but now I need to run it on CPU which that respo does not provide. I have changed the anchor sizes and anchor scales according to the one that I trained on but I am still getting mismatch errors.

Called with args: Namespace(add_params=[], class_agnostic=False, cuda=False, dataset='voc_2007_trainval', epoch=50, image_dir='images/', load_dir='models', mGPU=False, mode='detect', net='resnet101', session=1, vis=True) Current device: CPU Using config: GENERAL: {'MAX_IMG_RATIO': 2.0, 'MAX_IMG_SIZE': 1000, 'MIN_IMG_RATIO': 0.5, 'MIN_IMG_SIZE': 600, 'POOLING_MODE': 'pool', 'POOLING_SIZE': 7} TEST: {'NMS': 0.3, 'RPN_NMS_THRESHOLD': 0.7, 'RPN_POST_NMS_TOP': 300, 'RPN_PRE_NMS_TOP': 6000} RPN: {'ANCHOR_SCALES': [2, 4, 8, 16, 32], 'ANCHOR_RATIOS': [0.5, 1, 2, 4, 8], 'FEATURE_STRIDE': 16} /home/mahad/frcnn_cpu2/faster-rcnn-pytorch/data/images/ Loading classes for image dataset... WARNING! Cannot find "devkit_path" in additional parameters. Try to use default path (./data/VOCdevkit)... Used image config: {'color_mode': 'BGR', 'range': 255, 'mean': [102.9801, 115.9465, 122.7717], 'std': [1.0, 1.0, 1.0]} Loaded classes for PascalVoc 2007 trainval dataset. Loading image dataset... Used image config: {'color_mode': 'BGR', 'range': 255, 'mean': [102.9801, 115.9465, 122.7717], 'std': [1.0, 1.0, 1.0]} Loaded Detection dataset. Preparing image data...

Done. Output directory: /home/mahad/frcnn_cpu2/faster-rcnn-pytorch/data/images/result Loading model from /home/mahad/frcnn_cpu2/faster-rcnn-pytorch/data/models/resnet101/voc_2007/frcnn_1_50.pth Traceback (most recent call last): File "run.py", line 147, in detect(dataset=args.dataset, net=args.net, class_agnostic=args.class_agnostic, File "/home/mahad/frcnn_cpu2/faster-rcnn-pytorch/script/detect.py", line 67, in detect faster_rcnn.load_state_dict(checkpoint['model']) File "/home/mahad/anaconda3/envs/dl38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 829, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Resnet: size mismatch for RCNN_rpn.RPN_Conv.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for RCNN_rpn.RPN_Conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for RCNN_rpn.RPN_cls_score.weight: copying a param with shape torch.Size([50, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([50, 1024, 1, 1]). size mismatch for RCNN_rpn.RPN_bbox_pred.weight: copying a param with shape torch.Size([100, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([100, 1024, 1, 1]). size mismatch for RCNN_cls_score.weight: copying a param with shape torch.Size([2, 2048]) from checkpoint, the shape in current model is torch.Size([21, 2048]). size mismatch for RCNN_cls_score.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([21]). size mismatch for RCNN_bbox_pred.weight: copying a param with shape torch.Size([8, 2048]) from checkpoint, the shape in current model is torch.Size([84, 2048]). size mismatch for RCNN_bbox_pred.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([84]).

Do I have to train again with this respository or is it possible to remove these errors? Thanks.

Jun 22 '20 13:06 Mahad-M

You can't remove this errors, because:

You try to load trained checkpoint with 2 class (RCNN_cls_score.weight ---> torch.Size([2, 2048])) in 21 class model;
This repo used torchvision resnet50 models, so it has size = 1024 on layer3 output (RCNN_Base = RPN = out_depth) link. I don't understand why in your checkpoint there is 512-d dimension.

Are you sure, that you trained Resnet50 before, not VGG16?

Jun 22 '20 14:06 loolzaaa

It is a resnet101 checkpoint that I have trained and yes there are two classes in my dataset.

Jun 22 '20 14:06 Mahad-M

Oh, i see. Look here - this is my implementation of RPN network, and here Jwyang's implementation. He used pretrained model depth only for input channels of RPN conv layer. I use it for both - input and output channels.

You can try to change this:

self.RPN_Conv = nn.Conv2d(in_depth, in_depth, 3, 1, 1, bias=True)
...
self.RPN_cls_score = nn.Conv2d(in_depth, self.nc_score_out, 1, 1, 0)
...
self.RPN_bbox_pred = nn.Conv2d(in_depth, self.nc_bbox_out, 1, 1, 0)

to this:

self.RPN_Conv = nn.Conv2d(in_depth, 512, 3, 1, 1, bias=True)
...
self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0)
...
self.RPN_bbox_pred = nn.Conv2d(512, self.nc_bbox_out, 1, 1, 0)

So, what about two classes of your dataset. You need to add it to library and script create final layers for two classes.

After that, you can use your checkpoint.

Jun 22 '20 17:06 loolzaaa

Thanks a lot bud! Got it running :)

Jun 22 '20 18:06 Mahad-M

I have the model up and running but when I inference from jwyang's code, I get good results but when I use this repo on the CPU, The results are not the same. The score is very low and looks like the model has not been trained properly. I am attaching both the outputs The one with green boxes is the output of this repo while red boxes are from jwyang's inference. The model is same and the threshold is also same. Any help would be much appreciated. 82092117_det jwyang

Jun 30 '20 14:06 Mahad-M

Looks like CPU NMS error. Are you try inference this checkpoint in CUDA mode of this repo?

Jun 30 '20 15:06 loolzaaa

faster-rcnn-pytorch faster-rcnn-pytorch copied to clipboard

Inference from jywang's checkpoint

faster-rcnn-pytorch
faster-rcnn-pytorch copied to clipboard