caffe-yolo icon indicating copy to clipboard operation
caffe-yolo copied to clipboard

The dimensionality of the last conv's output is 6*6*30,but how can we reshape it to 1470。

Open sysuzyq opened this issue 8 years ago • 4 comments

layer { name: "conv_reg" type: "Convolution" bottom: "add_conv2" top: "conv_reg" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 30 ################output :6 * 6 * 30 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.0 } } } layer { name: "reg_reshape" type: "Reshape" bottom: "conv_reg" top: "regression" reshape_param { axis: 1 shape { dim: 1470 #############but here is 7 * 7 * 30, whish is (classes+num_object*5) * side *side } } } Your input is 448 * 448, but the feature map of last conv layer is 6 * 6, then the output should be 6 * 6 * 30 = 1080. So should we reshape to 1080 instead of 1470? look forward to your reply, thanks

sysuzyq avatar Feb 23 '17 06:02 sysuzyq

total stride=64, then the last feature map size is 448/64=7, so 7730=1470. side = weight (or height) / stride

ICTwangbiao avatar Feb 23 '17 08:02 ICTwangbiao

@ICTwangbiao thanks for your reply, but it confuses me. when i use the equation " h_o = (h_i + 2 * pad_h - kernel_h) / stride_h +1 " to calculate the output's side layer by layer, finally i get 6 . what's wrong with me? thanks

sysuzyq avatar Feb 23 '17 11:02 sysuzyq

@ICTwangbiao (classes+num_object*5) * side *side comes from the paper. But this code use 7 instead of 30 to represent one cell. https://github.com/yeahkun/caffe-yolo/blob/master/src/caffe/layers/box_data_layer.cpp#L142 make it very clear.:

CHECK_EQ(count, locations * 7)

Here locations is 7*7=49. So dim: 1470 is wrong here. I don't know why the code can run. The right should be dim: 343

quhezheng avatar Jan 02 '18 09:01 quhezheng

@quhezheng "CHECK_EQ(count, locations * 7)" is used to check your ground truth (of course, the number of your labels should be locations * 7 {class_LABEL , difficult, isobj, x, y, w ,h}), so 343 your mentioned is the number of your ground truth. BUT 1470 is the number of your regression layer outputs: locations * (class_NUMBER + score_confidence + coordinate infos) In @sysuzyq 's case, I guess he/she want detect 25 classes if num_object=1, so outputs = locations * (25+5)

ICTwangbiao avatar Jan 10 '18 07:01 ICTwangbiao