tf-faster-rcnn
tf-faster-rcnn copied to clipboard
The _reshape_layer method
I can't understand the _reshape_layer
method in the following code:
rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
Why not write like this:
cls_score_shape = tf.shape(rpn_cls_score_reshape)
rpn_cls_score_reshape = tf.reshape(rpn_cls_score, [cls_score_shape[0], cls_score_shape[1], cls_score_shape[2]*9, 2])
yeah should be improved on that, you are welcome to submit a pr after verification! :)
This piece of code is so confusing.
Tranpose would change the order of the entries in the memory (that is how they align in memory) so what @auroua proposed isn't doing the exact same thing as _reshape_layer().
"But I guess it doesn't matter at that point of the program that whether predicted bbs correspond to the location of them in the feature map so the code works out anyway."
Okay I don't think this statement is correct, hmmm, still trying to understand why this would work.
Here is my understanding, hope it may help.
First, make sure you understand the related data logic as blow:
-
after this
conv2d
operation, we get a tensor with shape: [1, h, w, self._num_anchors * 2]:rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')
-
We then need to set the meaning for
self._num_anchors * 2
datas for every h * w 'locations'. Reading the following code logic in filelayer_utils/proposal_layers.py
:# Get the scores and bounding boxes scores = rpn_cls_prob[:, :, :, num_anchors:]
This line of code informs:
- The first self._num_anchors data is the background scores for anchors
- The second self._num_anchors data is the object scores for anchors
Make it more clearly:
-
for every 'position', we get
nanchors
anchors: -
for every 'position', we need to get scores:
-
for specific anchor
, its background score is
, its object score is
Now, come back to the question, what is _reshape_layer method doing? I try to understand it from this point of view: How does the order of the entries in the memory change after each operation. Thanks to the tip given @hakillha .
Suppose the order of the entries is just the result of np.ravel(some_data, order='C')
. Details of ravel can be find here.
The _reshape_layer method helps to compute the scores with the format mentioned above. Below is my understanding:
-
To compute this binary probability (background or object), we need to make the last dimension of scores data equal to 2, then we can use
softmax
to compute the two probabilities on the last dimension. So the first work is: make the last dimension equal to 2, which is done by this line of code:# change it so that the score has 2 as its channel size rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
-
Before _reshape_layer, the scores data is stored in this sequence:
nlocatons = h * w
-
Inside _reshape_layer, first transpose:
to_caffe = tf.transpose(bottom, [0, 3, 1, 2])
after transpose, we have actually changed the the order of the entries in the memory like @hakillha mentioned. The scores data is change to this form:
-
Followed by one reshape operation:
reshaped = tf.reshape(to_caffe, tf.concat(axis=0, values=[[1, num_dim, -1], [input_shape[2]]]))
Reshape do not change the order of entries in memory, the the scores data sequence does not change. This step just change the second dimension to 2, and increase the third dimension correspondingly.
-
Followed by on transpose operation:
# then swap the channel back to_tf = tf.transpose(reshaped, [0, 2, 3, 1])
After this operation, scores data sequence are change to this form:
......
Now, one specific anchor's background score and object score item have been
rearranged
to be next to each other, we can apply softmax on it. -
Followed by softmax operation:
rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
rpn_cls_prob_reshape has this order of entries in the memory:
......
Where:
represents background probability for aindex's anchor in lindex's location.
represents object probability for aindex's anchor in lindex's location.
-
Now, 'convert' it back to what we want:
rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
-
Inside this _reshape_layer operation, first transpose:
tf.transpose(bottom, [0, 3, 1, 2])
The order of the entries are changed as:
...
......
...
-
Followed by one reshape:
reshaped = tf.reshape(to_caffe, tf.concat(axis=0, values=[[1, num_dim, -1], [input_shape[2]]]))
Where num_dim = nanchors * 2. And again, reshape does not change the order of the entries in memory.
-
Followed by one transpose:
to_tf = tf.transpose(reshaped, [0, 2, 3, 1])
The order of the entries are changed as:
......
-
Finally, the data's order of the entries in memory satisfies the target we want. For every location:
- The first self._num_anchors data is the background scores for anchors
- The second self._num_anchors data is the object scores for anchors Assuming the 'probability' is also some a kind of 'score', and these two words sometimes in code are exchangeable.
If something is wrong in my understanding, let me know.
@hi-zhengcheng great answer.It helps me a lot, thanks
@hi-zhengcheng A big thanks for your clearly answer!