object-detection-in-keras
object-detection-in-keras copied to clipboard
problem of size
Hi,
I'm dealing with an issue in a code but when I'm trying to connect my VGG16 train by myself to your ssd300. During the training, an error kill the process. It's about a problem of shape in the loss function. For more precision i give you the exact text show in terminal :
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,8732,4] vs. [1,10,0]
[[node compute/sub_1 (defined at /losses/smooth_l1_loss.py:22) ]] [Op:__inference_train_function_6041]
Errors may have originated from an input operation.
Input Source operations connected to node compute/sub_1:
compute/strided_slice_2 (defined at /losses/ssd_loss.py:43)
compute/strided_slice_3 (defined at /losses/ssd_loss.py:44)
the two lines mention in ssd_loss are :
bbox_true = y_true[:, :, -12:-8]
bbox_pred = y_pred[:, :, -12:-8]
A little precison about my vgg16, I'm using a binary vgg16 to detect just one class. if you have an idea about this problem it's will be very helpful. Thank you for your project who is very clear and complete.
Hi @AurelPuj. Can you show code snippets where you connect your own vgg16 network?
Hi @Socret360, I had the same problem working on Colab. I used as base network the pretrained on imagenet VGG16 taken from keras. From Colab I get the same error:
Epoch 1/100
Tensor("compute/strided_slice_2:0", shape=(None, None, None), dtype=float32)
Tensor("compute/strided_slice_3:0", shape=(None, 8096, 4), dtype=float32)
Tensor("compute/strided_slice_2:0", shape=(None, None, None), dtype=float32)
Tensor("compute/strided_slice_3:0", shape=(None, 8096, 4), dtype=float32)
InvalidArgumentError Traceback (most recent call last)
<ipython-input-41-ef3f47e4375b> in <module>()
66 validation_batch_size=3,
67 initial_epoch=0,
---> 68 epochs=100,
69 )
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 ctx.ensure_initialized()
58 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 59 inputs, attrs, num_outputs)
60 except core._NotOkStatusException as e:
61 if name is not None:
InvalidArgumentError: required broadcastable shapes
[[node compute/sub_1
(defined at <ipython-input-7-d1ca12c6228d>:23)
]] [Op:__inference_train_function_64988]
Errors may have originated from an input operation.
Input Source operations connected to node compute/sub_1:
In[0] compute/strided_slice_2 (defined at <ipython-input-7-d1ca12c6228d>:78)
In[1] compute/strided_slice_3 (defined at <ipython-input-7-d1ca12c6228d>:79)
The error is referencing to line 22 of file smooth_l1_loss.py. The 4 rows beginning with 'Tensor' are there because I inserted a print of y_true and y_pred before line 22.
This is the implementation of the total network:
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Concatenate, Add, Average, BatchNormalization, Dropout, Reshape, Activation
from tensorflow.keras import Sequential, Model, Input
from tensorflow.keras.applications import VGG16
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import Layer
model_config = config["model"]
num_classes = 2 # for background class
l2_reg = model_config["l2_regularization"]
kernel_initializer = model_config["kernel_initializer"]
default_boxes_config = model_config["default_boxes"]
extra_default_box_for_ar_1 = default_boxes_config["extra_box_for_ar_1"]
clip_default_boxes = default_boxes_config["clip_boxes"]
scales = np.linspace(
default_boxes_config["min_scale"],
default_boxes_config["max_scale"],
len(default_boxes_config["layers"])
)
mbox_conf_layers = []
mbox_loc_layers = []
mbox_default_boxes_layers = []
def add_conf_and_loc_layers(i, layer, num_classes, kernel_initializer,model):
num_default_boxes = get_number_default_boxes(
layer["aspect_ratios"],
extra_box_for_ar_1=extra_default_box_for_ar_1
)
x = model.get_layer(layer["name"]).output
layer_name = layer["name"]
if layer_name == "conv4_3":
layer_name = f"{layer_name}_norm"
x = L2Normalization(gamma_init=20, name=layer_name)(x)
layer_mbox_conf = Conv2D(
filters=num_default_boxes * num_classes,
kernel_size=(3, 3),
padding='same',
kernel_initializer=kernel_initializer,
kernel_regularizer=l2(l2_reg),
name=f"{layer_name}_mbox_conf")(x)
layer_mbox_conf_reshape = Reshape((-1, num_classes), name=f"{layer_name}_mbox_conf_reshape")(layer_mbox_conf)
layer_mbox_loc = Conv2D(
filters=num_default_boxes * 4,
kernel_size=(3, 3),
padding='same',
kernel_initializer=kernel_initializer,
kernel_regularizer=l2(l2_reg),
name=f"{layer_name}_mbox_loc")(x)
layer_mbox_loc_reshape = Reshape((-1, 4), name=f"{layer_name}_mbox_loc_reshape")(layer_mbox_loc)
layer_default_boxes = DefaultBoxes(
image_shape=(300,300,3),
scale=scales[i],
next_scale=scales[i+1] if i+1 <= len(default_boxes_config["layers"]) - 1 else 1,
aspect_ratios=layer["aspect_ratios"],
variances=default_boxes_config["variances"],
extra_box_for_ar_1=extra_default_box_for_ar_1,
clip_boxes=clip_default_boxes,
name=f"{layer_name}_default_boxes")(x)
layer_default_boxes_reshape = Reshape((-1, 8), name=f"{layer_name}_default_boxes_reshape")(layer_default_boxes)
mbox_conf_layers.append(layer_mbox_conf_reshape)
mbox_loc_layers.append(layer_mbox_loc_reshape)
mbox_default_boxes_layers.append(layer_default_boxes_reshape)
C = num_classes
backbone = VGG16(input_shape=(300,300,3), include_top=False, weights='imagenet', classes=C)
#cutting VGG at block5_conv3 layer
backbone = Model(inputs=backbone.input, outputs=backbone.get_layer('block5_conv3').output)
for layer in backbone.layers:
print(layer.name)
if "pool" in layer.name:
new_name = layer.name.replace("block", "")
new_name = new_name.split("_")
new_name = f"{new_name[1]}{new_name[0]}"
else:
new_name = layer.name.replace("conv", "")
new_name = new_name.replace("block", "conv")
backbone.get_layer(layer.name)._name = new_name
backbone.get_layer(layer.name)._kernel_initializer = "he_normal"
backbone.get_layer(layer.name)._kernel_regularizer = l2(l2_reg)
layer.trainable = False
pool5 = MaxPool2D(pool_size=(3, 3),strides=(1, 1),padding="same", name="pool5")(backbone.get_layer('conv5_3').output)
core = Model(inputs=backbone.input, outputs=pool5)
fc6 = Conv2D(1024,3, padding="same", name="fc6", dilation_rate=(6, 6))(pool5)
fc7 = Conv2D(1024,1, padding="same", name="fc7")(fc6)
conv8_1 = Conv2D(256,1, padding="valid", name="conv8_1")(fc7)
conv8_2 = Conv2D(512,3, strides=(2,2), padding="same", name="conv8_2")(conv8_1)
conv9_1 = Conv2D(128,1, padding="valid", name="conv9_1")(conv8_2)
conv9_2 = Conv2D(256,3, strides=(2,2), padding="same", name="conv9_2")(conv9_1)
conv10_1 = Conv2D(128,1, padding="valid", name="conv10_1" )(conv9_2)
conv10_2 = Conv2D(256,3, strides=(1,1), padding="valid", name="conv10_2")(conv10_1)
conv11_1 = Conv2D(128,1, padding="valid", name="conv11_1")(conv10_2)
conv11_2 = Conv2D(256,3, strides=(1,1), padding="valid", name="conv11_2")(conv11_1)
core = Model(inputs=backbone.input, outputs=conv11_2)
#input_image = tf.keras.applications.vgg16.preprocess_input(input_image)
for i, layer in enumerate(default_boxes_config["layers"]):
add_conf_and_loc_layers(i,layer,C,kernel_initializer,core)
# concentenate class confidence predictions from different feature map layers
mbox_conf = Concatenate(axis=-2, name="mbox_conf")(mbox_conf_layers)
mbox_conf_softmax = Activation('softmax', name='mbox_conf_softmax')(mbox_conf)
# concentenate object location predictions from different feature map layers
mbox_loc = Concatenate(axis=-2, name="mbox_loc")(mbox_loc_layers)
# concentenate default boxes from different feature map layers
mbox_default_boxes = Concatenate(axis=-2, name="mbox_default_boxes")(mbox_default_boxes_layers)
# concatenate confidence score predictions, bounding box predictions, and default boxes
predictions = Concatenate(axis=-1, name='predictions')([mbox_conf_softmax, mbox_loc, mbox_default_boxes])
ssd = Model(inputs=backbone.input, outputs=predictions)
And this is the summary of the entire mode (I have to classify only one class)l:
input_12 (InputLayer) [(None, 300, 300, 3 0 []
)]
conv1_1 (Conv2D) (None, 300, 300, 64 1792 ['input_12[0][0]']
)
conv1_2 (Conv2D) (None, 300, 300, 64 36928 ['conv1_1[0][0]']
)
pool1 (MaxPooling2D) (None, 150, 150, 64 0 ['conv1_2[0][0]']
)
conv2_1 (Conv2D) (None, 150, 150, 12 73856 ['pool1[0][0]']
8)
conv2_2 (Conv2D) (None, 150, 150, 12 147584 ['conv2_1[0][0]']
8)
pool2 (MaxPooling2D) (None, 75, 75, 128) 0 ['conv2_2[0][0]']
conv3_1 (Conv2D) (None, 75, 75, 256) 295168 ['pool2[0][0]']
conv3_2 (Conv2D) (None, 75, 75, 256) 590080 ['conv3_1[0][0]']
conv3_3 (Conv2D) (None, 75, 75, 256) 590080 ['conv3_2[0][0]']
pool3 (MaxPooling2D) (None, 37, 37, 256) 0 ['conv3_3[0][0]']
conv4_1 (Conv2D) (None, 37, 37, 512) 1180160 ['pool3[0][0]']
conv4_2 (Conv2D) (None, 37, 37, 512) 2359808 ['conv4_1[0][0]']
conv4_3 (Conv2D) (None, 37, 37, 512) 2359808 ['conv4_2[0][0]']
pool4 (MaxPooling2D) (None, 18, 18, 512) 0 ['conv4_3[0][0]']
conv5_1 (Conv2D) (None, 18, 18, 512) 2359808 ['pool4[0][0]']
conv5_2 (Conv2D) (None, 18, 18, 512) 2359808 ['conv5_1[0][0]']
conv5_3 (Conv2D) (None, 18, 18, 512) 2359808 ['conv5_2[0][0]']
pool5 (MaxPooling2D) (None, 18, 18, 512) 0 ['conv5_3[0][0]']
fc6 (Conv2D) (None, 18, 18, 1024 4719616 ['pool5[0][0]']
)
fc7 (Conv2D) (None, 18, 18, 1024 1049600 ['fc6[0][0]']
)
conv8_1 (Conv2D) (None, 18, 18, 256) 262400 ['fc7[0][0]']
conv8_2 (Conv2D) (None, 9, 9, 512) 1180160 ['conv8_1[0][0]']
conv9_1 (Conv2D) (None, 9, 9, 128) 65664 ['conv8_2[0][0]']
conv9_2 (Conv2D) (None, 5, 5, 256) 295168 ['conv9_1[0][0]']
conv10_1 (Conv2D) (None, 5, 5, 128) 32896 ['conv9_2[0][0]']
conv10_2 (Conv2D) (None, 3, 3, 256) 295168 ['conv10_1[0][0]']
conv11_1 (Conv2D) (None, 3, 3, 128) 32896 ['conv10_2[0][0]']
conv4_3_norm (L2Normalization) (None, 37, 37, 512) 512 ['conv4_3[0][0]']
conv11_2 (Conv2D) (None, 1, 1, 256) 295168 ['conv11_1[0][0]']
conv4_3_norm_mbox_conf (Conv2D (None, 37, 37, 8) 36872 ['conv4_3_norm[0][0]']
)
fc7_mbox_conf (Conv2D) (None, 18, 18, 12) 110604 ['fc7[0][0]']
conv8_2_mbox_conf (Conv2D) (None, 9, 9, 12) 55308 ['conv8_2[0][0]']
conv9_2_mbox_conf (Conv2D) (None, 5, 5, 12) 27660 ['conv9_2[0][0]']
conv10_2_mbox_conf (Conv2D) (None, 3, 3, 8) 18440 ['conv10_2[0][0]']
conv11_2_mbox_conf (Conv2D) (None, 1, 1, 8) 18440 ['conv11_2[0][0]']
conv4_3_norm_mbox_conf_reshape (None, 5476, 2) 0 ['conv4_3_norm_mbox_conf[0][0]']
(Reshape)
fc7_mbox_conf_reshape (Reshape (None, 1944, 2) 0 ['fc7_mbox_conf[0][0]']
)
conv8_2_mbox_conf_reshape (Res (None, 486, 2) 0 ['conv8_2_mbox_conf[0][0]']
hape)
conv9_2_mbox_conf_reshape (Res (None, 150, 2) 0 ['conv9_2_mbox_conf[0][0]']
hape)
conv10_2_mbox_conf_reshape (Re (None, 36, 2) 0 ['conv10_2_mbox_conf[0][0]']
shape)
conv11_2_mbox_conf_reshape (Re (None, 4, 2) 0 ['conv11_2_mbox_conf[0][0]']
shape)
conv4_3_norm_mbox_loc (Conv2D) (None, 37, 37, 16) 73744 ['conv4_3_norm[0][0]']
fc7_mbox_loc (Conv2D) (None, 18, 18, 24) 221208 ['fc7[0][0]']
conv8_2_mbox_loc (Conv2D) (None, 9, 9, 24) 110616 ['conv8_2[0][0]']
conv9_2_mbox_loc (Conv2D) (None, 5, 5, 24) 55320 ['conv9_2[0][0]']
conv10_2_mbox_loc (Conv2D) (None, 3, 3, 16) 36880 ['conv10_2[0][0]']
conv11_2_mbox_loc (Conv2D) (None, 1, 1, 16) 36880 ['conv11_2[0][0]']
conv4_3_norm_default_boxes (De (None, 37, 37, 4, 8 0 ['conv4_3_norm[0][0]']
faultBoxes) )
fc7_default_boxes (DefaultBoxe (None, 18, 18, 6, 8 0 ['fc7[0][0]']
s) )
conv8_2_default_boxes (Default (None, 9, 9, 6, 8) 0 ['conv8_2[0][0]']
Boxes)
conv9_2_default_boxes (Default (None, 5, 5, 6, 8) 0 ['conv9_2[0][0]']
Boxes)
conv10_2_default_boxes (Defaul (None, 3, 3, 4, 8) 0 ['conv10_2[0][0]']
tBoxes)
conv11_2_default_boxes (Defaul (None, 1, 1, 4, 8) 0 ['conv11_2[0][0]']
tBoxes)
mbox_conf (Concatenate) (None, 8096, 2) 0 ['conv4_3_norm_mbox_conf_reshape[
0][0]',
'fc7_mbox_conf_reshape[0][0]',
'conv8_2_mbox_conf_reshape[0][0]
',
'conv9_2_mbox_conf_reshape[0][0]
',
'conv10_2_mbox_conf_reshape[0][0
]',
'conv11_2_mbox_conf_reshape[0][0
]']
conv4_3_norm_mbox_loc_reshape (None, 5476, 4) 0 ['conv4_3_norm_mbox_loc[0][0]']
(Reshape)
fc7_mbox_loc_reshape (Reshape) (None, 1944, 4) 0 ['fc7_mbox_loc[0][0]']
conv8_2_mbox_loc_reshape (Resh (None, 486, 4) 0 ['conv8_2_mbox_loc[0][0]']
ape)
conv9_2_mbox_loc_reshape (Resh (None, 150, 4) 0 ['conv9_2_mbox_loc[0][0]']
ape)
conv10_2_mbox_loc_reshape (Res (None, 36, 4) 0 ['conv10_2_mbox_loc[0][0]']
hape)
conv11_2_mbox_loc_reshape (Res (None, 4, 4) 0 ['conv11_2_mbox_loc[0][0]']
hape)
conv4_3_norm_default_boxes_res (None, 5476, 8) 0 ['conv4_3_norm_default_boxes[0][0
hape (Reshape) ]']
fc7_default_boxes_reshape (Res (None, 1944, 8) 0 ['fc7_default_boxes[0][0]']
hape)
conv8_2_default_boxes_reshape (None, 486, 8) 0 ['conv8_2_default_boxes[0][0]']
(Reshape)
conv9_2_default_boxes_reshape (None, 150, 8) 0 ['conv9_2_default_boxes[0][0]']
(Reshape)
conv10_2_default_boxes_reshape (None, 36, 8) 0 ['conv10_2_default_boxes[0][0]']
(Reshape)
conv11_2_default_boxes_reshape (None, 4, 8) 0 ['conv11_2_default_boxes[0][0]']
(Reshape)
mbox_conf_softmax (Activation) (None, 8096, 2) 0 ['mbox_conf[0][0]']
mbox_loc (Concatenate) (None, 8096, 4) 0 ['conv4_3_norm_mbox_loc_reshape[0
][0]',
'fc7_mbox_loc_reshape[0][0]',
'conv8_2_mbox_loc_reshape[0][0]'
, 'conv9_2_mbox_loc_reshape[0][0]
',
'conv10_2_mbox_loc_reshape[0][0]
',
'conv11_2_mbox_loc_reshape[0][0]
']
mbox_default_boxes (Concatenat (None, 8096, 8) 0 ['conv4_3_norm_default_boxes_resh
e) ape[0][0]',
'fc7_default_boxes_reshape[0][0]
',
'conv8_2_default_boxes_reshape[0
][0]',
'conv9_2_default_boxes_reshape[0
][0]',
'conv10_2_default_boxes_reshape[
0][0]',
'conv11_2_default_boxes_reshape[
0][0]']
predictions (Concatenate) (None, 8096, 14) 0 ['mbox_conf_softmax[0][0]',
'mbox_loc[0][0]',
'mbox_default_boxes[0][0]']
Could you help me to fix this error? Thank you in advance