keras-applications
keras-applications copied to clipboard
InceptionResnetV2 summary() seems to be another network as it does not look like the one in the paper.
Summary
Importing the model of InceptionResnetV2 seems to be importing another model instead.
Environment
- Python version: 3.8.2
- Keras version: 2.3.1
- Keras-applications version: 1.0.8
Logs or source codes for reproduction
If you do:
import keras
mod = keras.applications.InceptionResNetV2()
mod.summary()
The first lines of the model output look something like this:
Model: "inception_resnet_v2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 299, 299, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 149, 149, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 149, 149, 32) 96 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 149, 149, 32) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 147, 147, 32) 9216 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 147, 147, 32) 96 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 147, 147, 32) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 147, 147, 64) 18432 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 147, 147, 64) 192 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 147, 147, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 73, 73, 64) 0 activation_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 73, 73, 80) 5120 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 73, 73, 80) 240 conv2d_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 73, 73, 80) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 71, 71, 192) 138240 activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 71, 71, 192) 576 conv2d_5[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 71, 71, 192) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 35, 35, 192) 0 activation_5[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 35, 35, 64) 12288 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 35, 35, 64) 192 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 35, 35, 64) 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 35, 35, 48) 9216 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 35, 35, 96) 55296 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 35, 35, 48) 144 conv2d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 35, 35, 96) 288 conv2d_10[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 35, 35, 48) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 35, 35, 96) 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, 35, 35, 192) 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
.
.
.
and many more lines
If you take a look, you first see that these layers are put sequentially, and there's no filter_concatenation done. As the original paper say, the STEM block for InceptionResnetV2 doesn't look like the one above, and the one above look more like the STEM block for InceptionResnetV1. Here below are the two STEM blocks for both architectures:
I've found an implementation for InceptionV4, and it does have the STEM block for InceptionV4 and InceptionResnetV2 well specified. The output for their lines are:
Model: "inception_v4"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 512, 512, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 255, 255, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 255, 255, 32) 96 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 255, 255, 32) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 253, 253, 32) 9216 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 253, 253, 32) 96 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 253, 253, 32) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 253, 253, 64) 18432 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 253, 253, 64) 192 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 253, 253, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 126, 126, 96) 55296 activation_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 126, 126, 96) 288 conv2d_4[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 126, 126, 64) 0 activation_3[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 126, 126, 96) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 126, 126, 160 0 max_pooling2d_1[0][0]
activation_4[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 126, 126, 64) 10240 concatenate_1[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 126, 126, 64) 192 conv2d_7[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 126, 126, 64) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 126, 126, 64) 28672 activation_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 126, 126, 64) 192 conv2d_8[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 126, 126, 64) 0 batch_normalization_8[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 126, 126, 64) 10240 concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 126, 126, 64) 28672 activation_8[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 126, 126, 64) 192 conv2d_5[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 126, 126, 64) 192 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 126, 126, 64) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 126, 126, 64) 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 124, 124, 96) 55296 activation_5[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 124, 124, 96) 55296 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 124, 124, 96) 288 conv2d_6[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 124, 124, 96) 288 conv2d_10[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 124, 124, 96) 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 124, 124, 96) 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 124, 124, 192 0 activation_6[0][0]
activation_10[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 61, 61, 192) 331776 concatenate_2[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 61, 61, 192) 576 conv2d_11[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 61, 61, 192) 0 batch_normalization_11[0][0]
__________________________________________________________________________________________________
.
.
.
and many more lines
Here there are concatenations and the STEM block seems to be correct. The InceptionResnetV2 first lines seem to be the first lines of InceptionResnetV1. Why is this happening?
Got an answer from google: https://stackoverflow.com/questions/64488034/inceptionresnetv2-stem-block-keras-implementation-mismatch-the-one-in-the-origin
Seems that they were just changing it during internal experiments. Nonetheless, the publishers say there are no difference in performance.