Issues reproducing network - resulting with different size
Hello! :) I'm trying to implement your network with keras and it that the network I built has many more parameters than the amount you declared at your paper. You've mentioned you have been able to train the entire network with a batch size of 20 using 12GB. (I've even seen in #5 that you've mentioned you use 10.969GB) It seems that my gpu has 10.57GiB available, but when I try to use a batch size of 15, which by calculation should fit the gpu, the gpu cannot fit the model into it's memory. I've even removed the 3-D regresson part and it still fails.
So I wanted to ask if you could help me see if i've made any implementation error :) Could you for example provide the total number of parameters of your model? And perhaps even better, provide the number of parameters per layer? :)
Here is the description of my implementation :) I've defined the network as follows:
def layoutnet():
# Encoder
input = layers.Input(shape=(6, 512, 1024)) # chw format
e1 = conv2d_relu_pool(input, 32, name='e1') # [?, 32, 256, 512]
e2 = conv2d_relu_pool(e1, 64, name='e2') # [?, 64, 128, 256]
e3 = conv2d_relu_pool(e2, 128, name='e3') # [?, 128, 64, 128]
e4 = conv2d_relu_pool(e3, 256, name='e4') # [?, 256, 32, 64]
e5 = conv2d_relu_pool(e4, 512, name='e5') # [?, 512, 16, 32]
e6 = conv2d_relu_pool(e5, 1024, name='e6') # [?, 1024, 8, 16]
e7 = conv2d_relu_pool(e6, 2048, name='e7') # [?, 2048, 4, 8]
encoder = Model(input, e7)
# Top decoder branch
td1 = up_conv2d_relu(e7, 1024, 'td1') # [?, 8, 16, 1024]
td1 = layers.Concatenate(axis=1, name='td1_concat')([td1, e6]) # [?, 1024 * 2, 8, 16]
td2 = up_conv2d_relu(td1, 512, name='td2') # [?, 16, 32, 512]
td2 = layers.Concatenate(axis=1, name='td2_concat')([td2, e5]) # [?, 512 * 2, 16, 32]
td3 = up_conv2d_relu(td2, 256, name='td3') # [?, 32, 64, 256]
td3 = layers.Concatenate(axis=1, name='td3_concat')([td3, e4]) # [?, 256 * 2, 32, 64]
td4 = up_conv2d_relu(td3, 128, name='td4') # [?, 64, 128, 128]
td4 = layers.Concatenate(axis=1, name='td4_concat')([td4, e3]) # [?, 128 * 2, 64, 128]
td5 = up_conv2d_relu(td4, 64, name='td5') # [?, 128, 256, 64]
td5 = layers.Concatenate(axis=1, name='td5_concat')([td5, e2]) # [?, 64 * 2, 128, 256]
td6 = up_conv2d_relu(td5, 32, name='td6') # [?, 256, 512, 32]
td6 = layers.Concatenate(axis=1, name='td6_concat')([td6, e1]) # [?, 32 * 2, 256, 512]
td7 = up_conv2d_relu(td6, 3, name='td7') # [?, 512, 1024, 3]
td = layers.Activation('sigmoid')(td7)
top_decoder = Model(input, td)
# Bottom decoder branch
bd1 = layers.Convolution2D(1024, (3, 3), (1, 1), padding='same', activation='relu', name='bd1_conv'+'_conv')(top_decoder.get_layer('td1_upsample').output) # [?, 1024, 8, 16]
bd1 = layers.Concatenate(axis=1, name='bd1_concat')([bd1, td1]) # [?, 1024 * 3, 8, 16]
bd2 = up_conv2d_relu(bd1, 512, name='bd2') # [?, 16, 32, 512]
bd2 = layers.Concatenate(axis=1, name='bd2_concat')([bd2, td2]) # [?, 512 * 3, 16, 32]
bd3 = up_conv2d_relu(bd2, 256, name='bd3') # [?, 32, 64, 256]
bd3 = layers.Concatenate(axis=1, name='bd3_concat')([bd3, td3]) # [?, 256 * 3, 32, 64]
bd4 = up_conv2d_relu(bd3, 128, name='bd4') # [?, 64, 128, 128]
bd4 = layers.Concatenate(axis=1, name='bd4_concat')([bd4, td4]) # [?, 128 * 3, 64, 128]
bd5 = up_conv2d_relu(bd4, 64, name='bd5') # [?, 128, 256, 64]
bd5 = layers.Concatenate(axis=1, name='bd5_concat')([bd5, td5]) # [?, 64 * 3, 128, 256]
bd6 = up_conv2d_relu(bd5, 32, name='bd6') # [?, 256, 512, 32]
bd6 = layers.Concatenate(axis=1, name='bd6_concat')([bd6, td6]) # [?, 32 * 3, 256, 512]
bd7 = up_conv2d_relu(bd6, 1, name='bd7') # [?, 512, 1024, 1]
bd = layers.Activation('sigmoid')(bd7)
bot_decoder = Model(input, bd)
# 3D box
# reg = layers.Concatenate(axis=1, name='reg_input')([td, bd]) # [?, 4, 512, 1024]
# reg = conv2d_relu_pool(reg, 8, name='reg_downsample1') # [?, 8, 256, 512]
# reg = conv2d_relu_pool(reg, 16, name='reg_downsample2') # [?, 16, 128, 256]
# reg = conv2d_relu_pool(reg, 32, name='reg_downsample3') # [?, 32, 64, 128]
# reg = conv2d_relu_pool(reg, 64, name='reg_downsample4') # [?, 64, 32, 64]
# reg = conv2d_relu_pool(reg, 128, name='reg_downsample5') # [?, 128, 16, 32]
# reg = conv2d_relu_pool(reg, 256, name='reg_downsample6') # [?, 256, 8, 16]
# reg = conv2d_relu_pool(reg, 512, name='reg_downsample7') # [?, 512, 4, 8]
# reg = layers.Flatten(name='reg_flatten')(reg)
# reg = layers.Dense(1024, activation='relu', name='reg_dense1')(reg)
# reg = layers.Dense(256, activation='relu', name='reg_dense2')(reg)
# reg = layers.Dense(64, activation='relu', name='reg_dense3')(reg)
# reg = layers.Dense(6, name='reg_dense4')(reg)
# model = Model(input, [top_decoder, bot_decoder, reg])
model = Model(input, [td, bd])
return model
And the number of parameters per layer is shown here:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 6, 512, 1024) 0
__________________________________________________________________________________________________
e1_conv (Conv2D) (None, 32, 512, 1024 1760 input_1[0][0]
__________________________________________________________________________________________________
e1_pool (MaxPooling2D) (None, 32, 256, 512) 0 e1_conv[0][0]
__________________________________________________________________________________________________
e2_conv (Conv2D) (None, 64, 256, 512) 18496 e1_pool[0][0]
__________________________________________________________________________________________________
e2_pool (MaxPooling2D) (None, 64, 128, 256) 0 e2_conv[0][0]
__________________________________________________________________________________________________
e3_conv (Conv2D) (None, 128, 128, 256 73856 e2_pool[0][0]
__________________________________________________________________________________________________
e3_pool (MaxPooling2D) (None, 128, 64, 128) 0 e3_conv[0][0]
__________________________________________________________________________________________________
e4_conv (Conv2D) (None, 256, 64, 128) 295168 e3_pool[0][0]
__________________________________________________________________________________________________
e4_pool (MaxPooling2D) (None, 256, 32, 64) 0 e4_conv[0][0]
__________________________________________________________________________________________________
e5_conv (Conv2D) (None, 512, 32, 64) 1180160 e4_pool[0][0]
__________________________________________________________________________________________________
e5_pool (MaxPooling2D) (None, 512, 16, 32) 0 e5_conv[0][0]
__________________________________________________________________________________________________
e6_conv (Conv2D) (None, 1024, 16, 32) 4719616 e5_pool[0][0]
__________________________________________________________________________________________________
e6_pool (MaxPooling2D) (None, 1024, 8, 16) 0 e6_conv[0][0]
__________________________________________________________________________________________________
e7_conv (Conv2D) (None, 2048, 8, 16) 18876416 e6_pool[0][0]
__________________________________________________________________________________________________
e7_pool (MaxPooling2D) (None, 2048, 4, 8) 0 e7_conv[0][0]
__________________________________________________________________________________________________
td1_upsample (UpSampling2D) (None, 2048, 8, 16) 0 e7_pool[0][0]
__________________________________________________________________________________________________
td1_conv (Conv2D) (None, 1024, 8, 16) 18875392 td1_upsample[0][0]
__________________________________________________________________________________________________
td1_concat (Concatenate) (None, 2048, 8, 16) 0 td1_conv[0][0]
e6_pool[0][0]
__________________________________________________________________________________________________
bd1_conv_conv (Conv2D) (None, 1024, 8, 16) 18875392 td1_upsample[0][0]
__________________________________________________________________________________________________
td2_upsample (UpSampling2D) (None, 2048, 16, 32) 0 td1_concat[0][0]
__________________________________________________________________________________________________
bd1_concat (Concatenate) (None, 3072, 8, 16) 0 bd1_conv_conv[0][0]
td1_concat[0][0]
__________________________________________________________________________________________________
td2_conv (Conv2D) (None, 512, 16, 32) 9437696 td2_upsample[0][0]
__________________________________________________________________________________________________
bd2_upsample (UpSampling2D) (None, 3072, 16, 32) 0 bd1_concat[0][0]
__________________________________________________________________________________________________
td2_concat (Concatenate) (None, 1024, 16, 32) 0 td2_conv[0][0]
e5_pool[0][0]
__________________________________________________________________________________________________
bd2_conv (Conv2D) (None, 512, 16, 32) 14156288 bd2_upsample[0][0]
__________________________________________________________________________________________________
td3_upsample (UpSampling2D) (None, 1024, 32, 64) 0 td2_concat[0][0]
__________________________________________________________________________________________________
bd2_concat (Concatenate) (None, 1536, 16, 32) 0 bd2_conv[0][0]
td2_concat[0][0]
__________________________________________________________________________________________________
td3_conv (Conv2D) (None, 256, 32, 64) 2359552 td3_upsample[0][0]
__________________________________________________________________________________________________
bd3_upsample (UpSampling2D) (None, 1536, 32, 64) 0 bd2_concat[0][0]
__________________________________________________________________________________________________
td3_concat (Concatenate) (None, 512, 32, 64) 0 td3_conv[0][0]
e4_pool[0][0]
__________________________________________________________________________________________________
bd3_conv (Conv2D) (None, 256, 32, 64) 3539200 bd3_upsample[0][0]
__________________________________________________________________________________________________
td4_upsample (UpSampling2D) (None, 512, 64, 128) 0 td3_concat[0][0]
__________________________________________________________________________________________________
bd3_concat (Concatenate) (None, 768, 32, 64) 0 bd3_conv[0][0]
td3_concat[0][0]
__________________________________________________________________________________________________
td4_conv (Conv2D) (None, 128, 64, 128) 589952 td4_upsample[0][0]
__________________________________________________________________________________________________
bd4_upsample (UpSampling2D) (None, 768, 64, 128) 0 bd3_concat[0][0]
__________________________________________________________________________________________________
td4_concat (Concatenate) (None, 256, 64, 128) 0 td4_conv[0][0]
e3_pool[0][0]
__________________________________________________________________________________________________
bd4_conv (Conv2D) (None, 128, 64, 128) 884864 bd4_upsample[0][0]
__________________________________________________________________________________________________
td5_upsample (UpSampling2D) (None, 256, 128, 256 0 td4_concat[0][0]
__________________________________________________________________________________________________
bd4_concat (Concatenate) (None, 384, 64, 128) 0 bd4_conv[0][0]
td4_concat[0][0]
__________________________________________________________________________________________________
td5_conv (Conv2D) (None, 64, 128, 256) 147520 td5_upsample[0][0]
__________________________________________________________________________________________________
bd5_upsample (UpSampling2D) (None, 384, 128, 256 0 bd4_concat[0][0]
__________________________________________________________________________________________________
td5_concat (Concatenate) (None, 128, 128, 256 0 td5_conv[0][0]
e2_pool[0][0]
__________________________________________________________________________________________________
bd5_conv (Conv2D) (None, 64, 128, 256) 221248 bd5_upsample[0][0]
__________________________________________________________________________________________________
td6_upsample (UpSampling2D) (None, 128, 256, 512 0 td5_concat[0][0]
__________________________________________________________________________________________________
bd5_concat (Concatenate) (None, 192, 128, 256 0 bd5_conv[0][0]
td5_concat[0][0]
__________________________________________________________________________________________________
td6_conv (Conv2D) (None, 32, 256, 512) 36896 td6_upsample[0][0]
__________________________________________________________________________________________________
bd6_upsample (UpSampling2D) (None, 192, 256, 512 0 bd5_concat[0][0]
__________________________________________________________________________________________________
td6_concat (Concatenate) (None, 64, 256, 512) 0 td6_conv[0][0]
e1_pool[0][0]
__________________________________________________________________________________________________
bd6_conv (Conv2D) (None, 32, 256, 512) 55328 bd6_upsample[0][0]
__________________________________________________________________________________________________
bd6_concat (Concatenate) (None, 96, 256, 512) 0 bd6_conv[0][0]
td6_concat[0][0]
__________________________________________________________________________________________________
td7_upsample (UpSampling2D) (None, 64, 512, 1024 0 td6_concat[0][0]
__________________________________________________________________________________________________
bd7_upsample (UpSampling2D) (None, 96, 512, 1024 0 bd6_concat[0][0]
__________________________________________________________________________________________________
td7_conv (Conv2D) (None, 3, 512, 1024) 1731 td7_upsample[0][0]
__________________________________________________________________________________________________
bd7_conv (Conv2D) (None, 1, 512, 1024) 865 bd7_upsample[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 3, 512, 1024) 0 td7_conv[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 1, 512, 1024) 0 bd7_conv[0][0]
==================================================================================================
Total params: 94,347,396
Trainable params: 94,347,396
Non-trainable params: 0
@GalAvineri I check the total number of parameters of my model, this has the same number of parameters as yours. The memory cost differs from different deepnet tool. You can try to reduce the batch size further (e.g. just try 1), or to reduce the input image size as in https://github.com/zouchuhang/LayoutNet/issues/5#issuecomment-393949323