keras icon indicating copy to clipboard operation
keras copied to clipboard

Bug: BatchNormalization causes Input layer to lose name

Open bvbellomo opened this issue 2 years ago • 8 comments

If I start a functional network with: inputs = Input(shape=inputShape, name="data") inputs = BatchNormalization()(inputs) outputs = Dense(inputs)

model.summary doesn't show my batch normalization and loses the name of my input layer: input_1 (InputLayer) dense (Dense)

If I omit my BatchNormalization: inputs = Input(shape=inputShape, name="data") #inputs = BatchNormalization()(inputs) outputs = Dense(inputs)

model.summary shows what I expect: data (InputLayer) dense (Dense)

A BatchNormalization anywhere except immediately after an input layer appears to work correctly

bvbellomo avatar Jul 23 '22 18:07 bvbellomo

@bvbellomo, To expedite the trouble-shooting process, could you please provide a complete code and the TensorFlow version you are using.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs.

During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. Thank you!

tilakrayal avatar Jul 25 '22 09:07 tilakrayal

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Aug 01 '22 09:08 google-ml-butler[bot]

Using Keras 2.9.0

bvbellomo avatar Aug 06 '22 14:08 bvbellomo

import tensorflow as tf import numpy as np from keras.models import Functional from keras.layers import Dense, Conv1D, GlobalAveragePooling1D, Input, BatchNormalization

training_samples = 128 validation_samples = 32 sample_size = 64 sample_vals = 5

x_train = np.reshape( np.random.randint(0, 255, training_samples * sample_size * sample_vals), (training_samples, sample_size, sample_vals)) y_train = np.random.rand(training_samples,1)

x_validate = np.reshape( np.random.randint(0, 255, validation_samples * sample_size * sample_vals), (validation_samples, sample_size, sample_vals)) y_validate = np.random.rand(validation_samples,1)

inputShape =(x_train.shape[1],x_train.shape[2]) inputs = Input(shape=inputShape, name='My Input Data')

#This is the layer that causes the bug. Comment it out to see the correct input layer name inputs = BatchNormalization(name='Normalize')(inputs)

outputs = Conv1D(filters=3, kernel_size=3)(inputs) outputs = GlobalAveragePooling1D()(outputs) outputs = BatchNormalization()(outputs) outputs = Dense(units=1)(outputs)

model = Functional(inputs, outputs)

opt = tf.keras.optimizers.Adam() model.compile(optimizer=opt, loss='mean_squared_error') model.build(inputShape) model.summary() model.fit(x = x_train, y = y_train, epochs=10) err_nn_validate = model.evaluate(x_validate, y_validate)

bvbellomo avatar Aug 06 '22 14:08 bvbellomo

Output as written

Windows PowerShell Copyright (C) Microsoft Corporation. All rights reserved.

Try the new cross-platform PowerShell https://aka.ms/pscore6

PS E:\repos\class\Dissertation\Python> & C:/Users/brad/AppData/Local/Programs/Python/Python310/python.exe e:/repos/class/Dissertation/Python/bug.py 2022-08-06 10:41:38.353386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-08-06 10:41:38.651272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9621 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6 Model: "model"


Layer (type) Output Shape Param #

input_1 (InputLayer) [(None, 64, 5)] 0

conv1d (Conv1D) (None, 62, 3) 48

global_average_pooling1d (G (None, 3) 0 lobalAveragePooling1D)

batch_normalization (BatchN (None, 3) 12 ormalization)

dense (Dense) (None, 1) 4

================================================================= Total params: 64 Trainable params: 58 Non-trainable params: 6


Epoch 1/10 2022-08-06 10:41:39.866383: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8400 2022-08-06 10:41:40.814769: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. 4/4 [==============================] - 2s 4ms/step - loss: 0.8510 Epoch 2/10 4/4 [==============================] - 0s 476us/step - loss: 0.7667 Epoch 3/10 4/4 [==============================] - 0s 4ms/step - loss: 0.7013 Epoch 4/10 4/4 [==============================] - 0s 419us/step - loss: 0.6192 Epoch 5/10 4/4 [==============================] - 0s 0s/step - loss: 0.5839 Epoch 6/10 4/4 [==============================] - 0s 30us/step - loss: 0.5503 Epoch 7/10 4/4 [==============================] - 0s 777us/step - loss: 0.4962 Epoch 8/10 4/4 [==============================] - 0s 365us/step - loss: 0.4663 Epoch 9/10 4/4 [==============================] - 0s 834us/step - loss: 0.4195 Epoch 10/10 4/4 [==============================] - 0s 22us/step - loss: 0.3869 1/1 [==============================] - 0s 58ms/step - loss: 13.5099 PS E:\repos\class\Dissertation\Python>

bvbellomo avatar Aug 06 '22 14:08 bvbellomo

Output with line 25 commented out

Windows PowerShell Copyright (C) Microsoft Corporation. All rights reserved.

Try the new cross-platform PowerShell https://aka.ms/pscore6

PS E:\repos\class\Dissertation\Python> & C:/Users/brad/AppData/Local/Programs/Python/Python310/python.exe e:/repos/class/Dissertation/Python/bug.py 2022-08-06 10:44:14.228354: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-08-06 10:44:14.528832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9621 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6 Model: "model"


Layer (type) Output Shape Param #

My Input Data (InputLayer) [(None, 64, 5)] 0

conv1d (Conv1D) (None, 62, 3) 48

global_average_pooling1d (G (None, 3) 0 lobalAveragePooling1D)

batch_normalization (BatchN (None, 3) 12 ormalization)

dense (Dense) (None, 1) 4

================================================================= Total params: 64 Trainable params: 58 Non-trainable params: 6


Epoch 1/10 2022-08-06 10:44:15.751907: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8400 2022-08-06 10:44:16.706233: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. 4/4 [==============================] - 2s 2ms/step - loss: 5.4385 Epoch 2/10 4/4 [==============================] - 0s 0s/step - loss: 5.0424 Epoch 3/10 4/4 [==============================] - 0s 0s/step - loss: 4.6856 Epoch 4/10 4/4 [==============================] - 0s 0s/step - loss: 4.3810 Epoch 5/10 4/4 [==============================] - 0s 0s/step - loss: 3.9812 Epoch 6/10 4/4 [==============================] - 0s 4ms/step - loss: 3.6179 Epoch 7/10 4/4 [==============================] - 0s 0s/step - loss: 3.4493 Epoch 8/10 4/4 [==============================] - 0s 0s/step - loss: 3.1494 Epoch 9/10 4/4 [==============================] - 0s 25us/step - loss: 2.9298 Epoch 10/10 4/4 [==============================] - 0s 0s/step - loss: 2.7118 1/1 [==============================] - 0s 49ms/step - loss: 255.9982 PS E:\repos\class\Dissertation\Python>

bvbellomo avatar Aug 06 '22 14:08 bvbellomo

@bvbellomo, I was able to execute the code without any issues. Kindly find the gist of it here and let us know if we are missing anything here.

tilakrayal avatar Aug 11 '22 06:08 tilakrayal

This has the same problem, 'My Input Data' does not show in the model summary, nor does the first batch normalization.

bvbellomo avatar Aug 13 '22 15:08 bvbellomo

@bvbellomo I am not sure why Batch Normalization is used right after the input as by the definition of batch normalization, It consists of normalizing activation vectors from hidden layers using the first and the second statistical moments (mean and variance) of the current batch, it normalizes the outputs of activation functions and the input layer doesn't contain any activation functions. I don't think this is a bug as the use of batch normalization is not valid in this case. Thanks!

gowthamkpr avatar Sep 21 '22 18:09 gowthamkpr

Normalization of input is something most people want to do. It can be done manually, but why not give people a convenient way to do it? If you do decide not to provide or support this functionality, the framework should give an error. The current behavior of renaming layers is clearly not intended, and therefore a bug that should be fixed.

bvbellomo avatar Sep 22 '22 16:09 bvbellomo

@bvbellomo - Yes, batch norm should work in the inputs as well. But there's a typo in your network definition, you are overriding inputs, but its required later for defining the Model.

Your code:

inputs = Input(shape=inputShape, name='My Input Data')

#This is the layer that causes the bug. Comment it out to see the correct input layer name
inputs = BatchNormalization(name='Normalize')(inputs)
outputs = Conv1D(filters=3, kernel_size=3)(inputs)
...
model = Functional(inputs, outputs)

As you can see you are not keeping track of inputs and its overwritten.
Fix -

inputs = Input(shape=inputShape, name='My Input Data')

#This is the layer that causes the bug. Comment it out to see the correct input layer name
x = BatchNormalization(name='Normalize')(inputs) # FIX: Avoid overwriting inputs name
outputs = Conv1D(filters=3, kernel_size=3)(x) # use x instead of inputs
...
model = Functional(inputs, outputs)

After this change, it should work as expected and see all layers even when you keep BatchNorm right after input layer.

Let us know if this resolves the issue.

sampathweb avatar Sep 30 '22 18:09 sampathweb

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Oct 07 '22 19:10 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] avatar Oct 14 '22 20:10 google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Oct 14 '22 20:10 google-ml-butler[bot]