yolact_edge icon indicating copy to clipboard operation
yolact_edge copied to clipboard

Allow Resnet 18/34 Backbones

Open malfonsoNeoris opened this issue 3 years ago • 5 comments

Hi. great library. i have managed to run it on a nvidia Xavier NX with ~15FPS with 500 size images. Same "problem" of the others with also around 3G or ram consumed i was wondering if its possible to use/add resnet 18/34? which should giveme better fps and smaller memory footprint i have

  • downloaded pths from pytorchs

  • add backbones

  • add configs normal and edge

  • tried to train

  • failed Here my steps i have downloaded the pths for them from pytorch ( i saw that the resnet50 was of the same name of the pytorch models.. so i tryed!) Added config (basically just copy paste resnet50 and changed path and args accordanlly) resnet18_backbone = resnet101_backbone.copy({ 'name': 'ResNet18', 'path': 'resnet18-5c106cde.pth', 'type': ResNetBackbone, 'args': ([2, 2, 2, 2],), 'transform': resnet_transform, }) yolact_resnet18_config = yolact_base_config.copy({ 'name': 'yolact_resnet18',

    'backbone': resnet18_backbone.copy({ 'selected_layers': list(range(1, 4)),

      'pred_scales': yolact_base_config.backbone.pred_scales,
      'pred_aspect_ratios': yolact_base_config.backbone.pred_aspect_ratios,
      'use_pixel_scales': True,
      'preapply_sqrt': False,
      'use_square_anchors': True, # This is for backward compatability with a bug
    

    }),

}) yolact_edge_resnet18_config = yolact_edge_config.copy({ 'name': 'yolact_edge_resnet18', 'backbone': yolact_resnet18_config.backbone, })

and them tried to train, but a lot of error for the layers with different input/ouput sizes occurred ( sorry i deleted the messages and now i'm training other stuff.. but latter i will update with corresponding messages

malfonsoNeoris avatar Aug 18 '21 17:08 malfonsoNeoris

#edit! here the error.. for resnet34 Traceback (most recent call last): File "train.py", line 707, in train(0, args=args) File "train.py", line 256, in train yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) File "/content/yolact_edge/yolact_edge/yolact.py", line 1269, in init_weights self.backbone.init_backbone(backbone_path) File "/content/yolact_edge/yolact_edge/backbone.py", line 145, in init_backbone self.load_state_dict(state_dict, strict=False) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ResNetBackbone: size mismatch for layers.0.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1]). size mismatch for layers.0.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]). size mismatch for layers.0.2.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]). size mismatch for layers.1.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]). size mismatch for layers.1.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]). size mismatch for layers.1.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for layers.1.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for layers.1.0.downsample.1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for layers.1.0.downsample.1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for layers.1.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]). size mismatch for layers.1.2.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]). size mismatch for layers.1.3.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]). size mismatch for layers.2.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for layers.2.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1]). size mismatch for layers.2.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for layers.2.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for layers.2.0.downsample.1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for layers.2.0.downsample.1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for layers.2.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]). size mismatch for layers.2.2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]). size mismatch for layers.2.3.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]). size mismatch for layers.2.4.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]). size mismatch for layers.2.5.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]). size mismatch for layers.3.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for layers.3.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1]). size mismatch for layers.3.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for layers.3.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for layers.3.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for layers.3.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for layers.3.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1]). size mismatch for layers.3.2.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1]).

malfonsoNeoris avatar Aug 18 '21 18:08 malfonsoNeoris

It is because the building blocks of R50/R101 is different from R18/R34: https://pytorch.org/hub/pytorch_vision_resnet/ So you need to modify basic block so that the backbone architecture is compatible with the pretrained weights.

haotian-liu avatar Aug 19 '21 03:08 haotian-liu

hi, thanks for the info. i have added the BasicBlock similar to the one on pytorch to the backbone.py file and updated the config file when running resnet50/100 on 256/550 image size.. everything looks good. but with resnet 34/18 i'm getting 0 results on testing/inference. but while training.. the evaluation and map are up to 80%. Probably im doing something wrong.

In config for resnet18/34 i created the backbones resnet34_backbone = resnet101_backbone.copy({ 'name': 'ResNet34', 'path': 'resnet34-333f7ec4.pth', 'type': ResNetBackbone, 'args': ([3, 4, 6, 3],[],BasicBlock), 'transform': resnet_transform, })

then created specific confings this is the "default" for 101 yolact_edge_config = yolact_base_config.copy({ 'name': 'yolact_edge', #################### 'torch2trt_max_calibration_images': 0, #'torch2trt_backbone': False,

'torch2trt_backbone_int8': True,
'torch2trt_protonet_int8': True,
'torch2trt_fpn': True,
'torch2trt_prediction_module': True,
'use_fast_nms': False,
'dataset' : my_custom_dataset,
'num_classes':1+1,

# Image Size
'max_size': 256,
'min_size':200,

# Discard detections with width and height smaller than this (in absolute width and height)
'discard_box_width': 4 / 256,
'discard_box_height': 4 / 256,



# Training params
'lr_schedule': 'step',
'lr_steps': (4000, 6000, 8000, 9000),
'max_iter': 10000, 
###################  

})

then for example for 34 yolact_edge_resnet34_550_config = yolact_edge_config.copy({ 'name': 'yolact_edge_resnet34_550', 'backbone': yolact_resnet34_config.backbone,

# Image Size
'max_size': 550,
'min_size':200,

# Discard detections with width and height smaller than this (in absolute width and height)
'discard_box_width': 4 / 550,
'discard_box_height': 4 / 550,

})

attached. the config file, a test file, and the updated backbone with the BasicBlock example.zip

can you giveme a hint for where to look ?

malfonsoNeoris avatar Aug 25 '21 22:08 malfonsoNeoris

You might want to confirm that the model runs well with no TensorRT enabled first.

haotian-liu avatar Sep 22 '21 21:09 haotian-liu

You might want to confirm that the model runs well with no TensorRT enabled first.

I have trained the resnet18 and get the result, but when I try to eval the model with weights, here report the error: "Backbone: ResNet18 is not currently supported with TenSorRT. "

can you teach me how to modify the code?

charhc avatar May 06 '22 07:05 charhc