tf-faster-rcnn icon indicating copy to clipboard operation
tf-faster-rcnn copied to clipboard

How to change VGG16.py to wire conv4_3 instead of conv5_3?

Open hadign20 opened this issue 7 years ago • 12 comments

I have trained faster rcnn with a custom dataset based on Pascal VOC format. Now I wanted to use 3rd or 4th convolutional layers of vgg16 to deal with the object of specific size. But I don't know how to change the vgg16 net exactly. I tried to remove the final lines of this section:

  def _image_to_head(self, is_training, reuse=False):
    with tf.variable_scope(self._scope, self._scope, reuse=reuse):
      net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3],
                          trainable=False, scope='conv1')
      net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1')
      net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3],
                        trainable=False, scope='conv2')
      net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2')
      net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3],
                        trainable=is_training, scope='conv3')
      net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')
      net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3],
                        trainable=is_training, scope='conv4')
      net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')
      net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3],
trainable=is_training, scope='conv5')

but it didn't work. Where else should be modified to make it connect earlier layers of vgg16 to the RPN?

I would appreciate some help from experts on this. thanks

hadign20 avatar Aug 10 '17 14:08 hadign20

You can remove net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5'), without any problem. If you remove any pooling layers, you need to edit https://github.com/endernewton/tf-faster-rcnn/blob/master/lib/nets/network.py#L28 and the following after, since you no longer have 4 pooling layers (2^4)=16. Also see issue #184

jcarletgo avatar Aug 11 '17 21:08 jcarletgo

@jcarletgo thank you for the helpful tips. I removed the conv5 and pool4 layers and changed the lines below:

self._feat_stride = [16, ]
self._feat_compress = [1. / 16., ]

to

self._feat_stride = [8, ]
self._feat_compress = [1. / 8., ]

I also changed __C.DEDUP_BOXES = 1. / 16. in config.py (which is now missing for some reason I don't know) to __C.DEDUP_BOXES = 1. / 8.. In addition I changed the anchor base size from 16 to 2 in generate_anchors.py.

I'm trying to modify the model to fit better with smaller objects of almost the same size. But after all this, the precision got worse. Am I doing something wrong or should I change any other settings?

thanks

hadign20 avatar Aug 12 '17 04:08 hadign20

why do you change the anchor base size from 16 to 2? have you tried to change from 16 to 8? I thought the anchor base size better match the feat_stride

yaoqi-zd avatar Aug 12 '17 13:08 yaoqi-zd

@yaoqi-zd Before I remove the conv5 and pool4 layers, when the feat_stride was still 16, I trained the model with anchor base size 16, 8 , 4 and 2. The best results came with 2. Maybe because the objects in my images are small compare to the image size (like 40 pixels in a 1024 pixel image). But I don't know why the results got much worse when I removed the last layers. I thought they should have improved, but the opposite happened.

hadign20 avatar Aug 13 '17 04:08 hadign20

@hadi-ghnd em..I thought although the lower feature map(like conv4_3) has higher resolution, they have weaker semantic info, which may harm the localization performance. Have you tried to add a deconv layer to the conv5_3 to make conv5_3 the same size of conv4_3(let's say this feature map as deconv5_3) and then combine conv4_3 and deconv5_3 to get a featrue map that has both higher resolution and strong semantics?

yaoqi-zd avatar Aug 13 '17 12:08 yaoqi-zd

@yaoqi-zd I haven't tried your suggestion yet, but it seems to be a good idea. I am going to try it hoping the results would become better than the full 5 conv layers..

hadign20 avatar Aug 14 '17 16:08 hadign20

@hadi-ghnd please let me know if this idea works,thanks in advance!

yaoqi-zd avatar Aug 15 '17 05:08 yaoqi-zd

@yaoqi-zd hi,I'm interseted in about what you say:"@hadi-ghnd em..I thought although the lower feature map(like conv4_3) has higher resolution, they have weaker semantic info, which may harm the localization performance. Have you tried to add a deconv layer to the conv5_3 to make conv5_3 the same size of conv4_3(let's say this feature map as deconv5_3) and then combine conv4_3 and deconv5_3 to get a featrue map that has both higher resolution and strong semantics?",can you enlighten me how to add deconv5_3 after conv5_3 and how to combine conv4_3 and deconv5_3,much appreciate,thanks!

zqdeepbluesky avatar Jan 12 '18 16:01 zqdeepbluesky

@zqdeepbluesky you can check the feature fusion methods used in HyperNet or FPN(feature pyramid network)

yaoqi-zd avatar Jan 13 '18 08:01 yaoqi-zd

@yaoqi-zd great,thanks,I will try it.

zqdeepbluesky avatar Jan 13 '18 10:01 zqdeepbluesky

@hadi-ghnd
Hi, I also want to do some tests about small objects. I did not changed the layers of the VGG16 model. I just changed the lines below: self._feat_stride = [16, ] self._feat_compress = [1. / 16., ]

to self._feat_stride = [8, ] self._feat_compress = [1. / 8., ]

But I got error 'InvalidArgumentError (see above for traceback): exceptions.ValueError: operands could not be broadcast together with shapes (17100,1) (67500,1)' Is there any other parameters should I modify together? Or I can't only chang feat_stride without changing the VGG16 model? Thank you for your time!

Isabelliu avatar Mar 04 '18 10:03 Isabelliu

I have made the required changes as @hadi-ghnd except the __C.DEDUP_BOXES = 1. / 8., as this option is not there in the config.py. I am getting NaN after around 100 iterations. Can anyone suggest me is there any more changes required for removing the pool4 layer.

amlandas78 avatar Jul 02 '19 15:07 amlandas78