tf-faster-rcnn
tf-faster-rcnn copied to clipboard
How to change VGG16.py to wire conv4_3 instead of conv5_3?
I have trained faster rcnn with a custom dataset based on Pascal VOC format. Now I wanted to use 3rd or 4th convolutional layers of vgg16 to deal with the object of specific size. But I don't know how to change the vgg16 net exactly. I tried to remove the final lines of this section:
def _image_to_head(self, is_training, reuse=False):
with tf.variable_scope(self._scope, self._scope, reuse=reuse):
net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3],
trainable=False, scope='conv1')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3],
trainable=False, scope='conv2')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2')
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3],
trainable=is_training, scope='conv3')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3],
trainable=is_training, scope='conv4')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3],
trainable=is_training, scope='conv5')
but it didn't work. Where else should be modified to make it connect earlier layers of vgg16 to the RPN?
I would appreciate some help from experts on this. thanks
You can remove net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5'), without any problem. If you remove any pooling layers, you need to edit https://github.com/endernewton/tf-faster-rcnn/blob/master/lib/nets/network.py#L28 and the following after, since you no longer have 4 pooling layers (2^4)=16. Also see issue #184
@jcarletgo thank you for the helpful tips. I removed the conv5 and pool4 layers and changed the lines below:
self._feat_stride = [16, ]
self._feat_compress = [1. / 16., ]
to
self._feat_stride = [8, ]
self._feat_compress = [1. / 8., ]
I also changed __C.DEDUP_BOXES = 1. / 16.
in config.py (which is now missing for some reason I don't know) to __C.DEDUP_BOXES = 1. / 8.
.
In addition I changed the anchor base size from 16 to 2 in generate_anchors.py.
I'm trying to modify the model to fit better with smaller objects of almost the same size. But after all this, the precision got worse. Am I doing something wrong or should I change any other settings?
thanks
why do you change the anchor base size from 16 to 2? have you tried to change from 16 to 8? I thought the anchor base size better match the feat_stride
@yaoqi-zd Before I remove the conv5 and pool4 layers, when the feat_stride was still 16, I trained the model with anchor base size 16, 8 , 4 and 2. The best results came with 2. Maybe because the objects in my images are small compare to the image size (like 40 pixels in a 1024 pixel image). But I don't know why the results got much worse when I removed the last layers. I thought they should have improved, but the opposite happened.
@hadi-ghnd em..I thought although the lower feature map(like conv4_3) has higher resolution, they have weaker semantic info, which may harm the localization performance. Have you tried to add a deconv layer to the conv5_3 to make conv5_3 the same size of conv4_3(let's say this feature map as deconv5_3) and then combine conv4_3 and deconv5_3 to get a featrue map that has both higher resolution and strong semantics?
@yaoqi-zd I haven't tried your suggestion yet, but it seems to be a good idea. I am going to try it hoping the results would become better than the full 5 conv layers..
@hadi-ghnd please let me know if this idea works,thanks in advance!
@yaoqi-zd hi,I'm interseted in about what you say:"@hadi-ghnd em..I thought although the lower feature map(like conv4_3) has higher resolution, they have weaker semantic info, which may harm the localization performance. Have you tried to add a deconv layer to the conv5_3 to make conv5_3 the same size of conv4_3(let's say this feature map as deconv5_3) and then combine conv4_3 and deconv5_3 to get a featrue map that has both higher resolution and strong semantics?",can you enlighten me how to add deconv5_3 after conv5_3 and how to combine conv4_3 and deconv5_3,much appreciate,thanks!
@zqdeepbluesky you can check the feature fusion methods used in HyperNet or FPN(feature pyramid network)
@yaoqi-zd great,thanks,I will try it.
@hadi-ghnd
Hi, I also want to do some tests about small objects. I did not changed the layers of the VGG16 model.
I just changed the lines below:
self._feat_stride = [16, ]
self._feat_compress = [1. / 16., ]
to self._feat_stride = [8, ] self._feat_compress = [1. / 8., ]
But I got error 'InvalidArgumentError (see above for traceback): exceptions.ValueError: operands could not be broadcast together with shapes (17100,1) (67500,1)' Is there any other parameters should I modify together? Or I can't only chang feat_stride without changing the VGG16 model? Thank you for your time!
I have made the required changes as @hadi-ghnd except the __C.DEDUP_BOXES = 1. / 8., as this option is not there in the config.py. I am getting NaN after around 100 iterations. Can anyone suggest me is there any more changes required for removing the pool4 layer.