darkflow
darkflow copied to clipboard
Question about implementing REORG layer with tf.extract_image_patches?
Hi thtrieu,
Recently I also try to re-implement darknet with tensorflow, after figure out what the original reorg layer does, it seems that reorg layer algorithm is difficult to implement in tensorflow. Then I check out your code, find that you use tf.extract_image_patches to implement, but after several experiments, I think extract_image_patches algorithm is not actually the same as the original reorg layer. Question: may I ask why you use tf.extract_image_patches? Is it the same function as the original reorg layer? Notice that earlier you use _forward in darkflow, does _forward cause slower performance, so you replace _forward with tf.extract_image_patches?
Hopefully get your response ASAP, thank you!
I have discovered the same problem. Actually tf.extract_image_patches just implements what most people would expect a re-orgization layer should be like. However, the implementation of reorg in darknet seems a little bit weird. You can refer function forward_reorg_layer
in reorg_layer.c in darknet code for its implementation.
Here is a piece of code in my reimplemtation of darknet using Keras. You can see that the extract_patches
has two modes, when darknet
is True
, it will do some strange operation and yield exactly the same result as darknet. However, the false part in the if-statement is a usual pixel reorganization which is consistent with tf.extract_image _patches
.
@staticmethod
def extract_patches(input_tensor, stride, darknet):
_, h, w, c = input_tensor.get_shape().as_list()
if darknet:
channel_first = keras.layers.Permute((3, 1, 2))(input_tensor)
reshape_tensor = keras.layers.Reshape((c // (stride ** 2), h, stride, w, stride))(channel_first)
permute_tensor = keras.layers.Permute((3, 5, 1, 2, 4))(reshape_tensor)
target_tensor = keras.layers.Reshape((-1, h // stride, w // stride))(permute_tensor)
channel_last = keras.layers.Permute((2, 3, 1))(target_tensor)
return keras.layers.Reshape((h // stride, w // stride, -1))(channel_last)
else:
reshape_tensor = keras.layers.Reshape((h // stride, stride, w // stride, stride, c))(input_tensor)
return keras.layers.Reshape((h // stride, w // stride, -1))(keras.layers.Permute((1, 3, 2, 4, 5))(reshape_tensor))
According to my understanding, the _forward()
method is what reorg
supposed to do. I even renamed the reorg
to local flatten
in the console print since it seems to be the essence of reorg
: flatten local regions of the input volume, so that final output is still a volume, as oppose to global flatten, which turns a volume to a 1-D vector.
But then I found extract_image_patches()
to be an equivalent built-in, so I replace in hope of speed improvement, a few test images gives okay prediction, I even ran prediction on video demos and find perfect bounding boxes. Can you tell is _forward()
exactly what you understand reorg
is? And how is extract_image_patches()
different? @damonzhou @ldf921.
Can anyone tell me what is "reorg_layer"?
@eugene123tw reorg_layer reorganize the output from layer 17 as the shape of the layer 25's output. Then the second route layer will concat the two output together. Its function is mostly shape transform.
@thtrieu It is about the way how reorg layer flatten the input volume. I have recently analyzed the convolutional layer algorithm, I think the weird reorg layer algorithm is related to the algorithm that used in Computation of convolutional layer. But it is complicated to explain here, and I'm not sure if I was right about this.
Hello every body...
I'm trying to implement my own Keras/Tensorflow YOLO detector. To do that, I import the weights of a YOLO V2 trained by default darknet framework in my Keras network. To evaluate the correctness of the model, I plot a set of feature maps of each layer where I expect to see the same image for darknet and for my implementation. So, this assumption is valid for all layers until the concatenation of the layers 16 and 25. To illustrate, I plot the outputs of the same feature map for darknet, for my implementation with @ldf921 solution and my implementation with tensorflow functions.
What you think?
Darknet output for first feature map of layer 30
Tensorflow output for first feature map of layer 30 with extract_patches, the @ldf921 solution
Tensorflow output for first feature map of layer 30 with stace_to_depth or extract_image_patches solution.
@ldf921 implementation has the best compatibility, I guess. However, her function require a fixed network input dimension. Any one knows how can I make a function like that for flexible input sizes?
Thanks a lot!
Can anyone tell me what is "reorg_layer"?
This blog post explains it very well, and if anyone wants to check the implementation in darkflow it would be awesome. I'd do it but I'm not experienced enough with tensorflow (and I don't have the time at the moment, I'm so sorry):
https://leimao.github.io/blog/Reorg-Layer-Explained/
I tried to replicate the darknet reorg layer, it gives the same results as one mentioned in blogpost by @doraemon96, except that the original reorg implementation doesn't work with c < s^2, but this does.
`
def reorg_layer(input_tensor, s):
_,h, w, c = input_tensor.get_shape().as_list()
channels_first = tf.keras.layers.Permute((3, 1, 2))(input_tensor)
x = tf.keras.layers.Reshape((c, h, w//s, s))(channels_first)
x = tf.keras.layers.Permute((1, 4, 2, 3))(x)
x = tf.keras.layers.Reshape((c, s, h // (s**2), s, s, w // s))(x)
x = tf.keras.layers.Permute((1, 2, 4, 3, 5, 6))(x)
x = tf.keras.layers.Reshape((c, s, s, h//s, w//s))(x)
x = tf.keras.layers.Permute((3, 2, 1, 4, 5))(x)
x = tf.keras.layers.Reshape((c *s*s, h // s, w // s))(x)
channels_last = tf.keras.layers.Permute((2, 3, 1))(x)
return channels_last
`