caffe-yolo
caffe-yolo copied to clipboard
The dimensionality of the last conv's output is 6*6*30,but how can we reshape it to 1470。
layer { name: "conv_reg" type: "Convolution" bottom: "add_conv2" top: "conv_reg" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 30 ################output :6 * 6 * 30 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.0 } } } layer { name: "reg_reshape" type: "Reshape" bottom: "conv_reg" top: "regression" reshape_param { axis: 1 shape { dim: 1470 #############but here is 7 * 7 * 30, whish is (classes+num_object*5) * side *side } } } Your input is 448 * 448, but the feature map of last conv layer is 6 * 6, then the output should be 6 * 6 * 30 = 1080. So should we reshape to 1080 instead of 1470? look forward to your reply, thanks
total stride=64, then the last feature map size is 448/64=7, so 7730=1470. side = weight (or height) / stride
@ICTwangbiao thanks for your reply, but it confuses me. when i use the equation " h_o = (h_i + 2 * pad_h - kernel_h) / stride_h +1 " to calculate the output's side layer by layer, finally i get 6 . what's wrong with me? thanks
@ICTwangbiao (classes+num_object*5) * side *side comes from the paper. But this code use 7 instead of 30 to represent one cell. https://github.com/yeahkun/caffe-yolo/blob/master/src/caffe/layers/box_data_layer.cpp#L142 make it very clear.:
CHECK_EQ(count, locations * 7)
Here locations is 7*7=49. So dim: 1470 is wrong here. I don't know why the code can run. The right should be dim: 343
@quhezheng "CHECK_EQ(count, locations * 7)" is used to check your ground truth (of course, the number of your labels should be locations * 7 {class_LABEL , difficult, isobj, x, y, w ,h}), so 343 your mentioned is the number of your ground truth. BUT 1470 is the number of your regression layer outputs: locations * (class_NUMBER + score_confidence + coordinate infos) In @sysuzyq 's case, I guess he/she want detect 25 classes if num_object=1, so outputs = locations * (25+5)