keras Can not get the same performance in the eager mode training

Can not get the same performance in the eager mode training

Open mortezamg63 opened this issue 2 years ago • 9 comments

Describe the feature and the current behavior/state.

Hi Keras team, I am trying to train an autoencoder using eager mode on MNIST dataset. Then, I want to train a classifier on top of the encoder part. In other words, my training has two steps. First, training autoencoder in an unsupervised manner. Then, use the encoder in the autoencoder for training a classifier. I am trying to convert a model.fit() training to eager mode training. I use the same settings and architectures for both model.fit() and eager mode. The losses in both training methods are different, and the result of the classifier is completely different. When I use the model.fit() for the training of the autoencoder, the accuracy of the classifier is around 50-60%. Contrary, If I train the autoencoder using the eager mode and train the classifier on top of the encoder part, the accuracy is between 20-30%. It seems that in the eager mode training, the autoencoder does not learn a true representation of data.

The colab link of my implementation is Link. This link is a provided implementation of my issue in google colab, which helps reproduce the results. The implementation has two overall parts: 1- Loading and preprocessing: packages+data and preprocessing 2- Training: 3 functions for training the autoencoder using the eager mode, training the autoencoder using model.fit() and training MLP (classifier). For training the classifier, I use the encoder to extract the features (latent space) for samples in the train set and test set. Then, the extracted features (latent space) is used to train the classifier.

In the following, I show the output of the autoencoder and the classifier. Losses are different for model.fit() and eager mode training. At the end is the performance of the classifier that is trained on the output of the encoder.

######################################## EAGER MODE TRAINING of autoencoder ##################################
Epoch 0, Loss 8.046977043151855
Epoch 1, Loss 6.08961296081543
Epoch 2, Loss 3.2807199954986572
Epoch 3, Loss 2.110403299331665
Epoch 4, Loss 1.688576102256775
Epoch 5, Loss 1.5530697107315063
Epoch 6, Loss 1.510565161705017
Epoch 7, Loss 1.4924875497817993
Epoch 8, Loss 1.4827122688293457
Epoch 9, Loss 1.477460503578186
Epoch 10, Loss 1.4743456840515137
Epoch 11, Loss 1.472372055053711
Epoch 12, Loss 1.4714778661727905
Epoch 13, Loss 1.470210075378418
Epoch 14, Loss 1.4703704118728638
######################################## Model.fit() TRAINING of autoencoder #################################
Epoch 1/15
4/4 [==============================] - 1s 27ms/step - loss: 1.0521 - mask_loss: 0.6378 - feature_loss: 0.2071
Epoch 2/15
4/4 [==============================] - 0s 21ms/step - loss: 0.6424 - mask_loss: 0.4019 - feature_loss: 0.1203
Epoch 3/15
4/4 [==============================] - 0s 22ms/step - loss: 0.4629 - mask_loss: 0.2936 - feature_loss: 0.0846
Epoch 4/15
4/4 [==============================] - 0s 20ms/step - loss: 0.4064 - mask_loss: 0.2575 - feature_loss: 0.0744
Epoch 5/15
4/4 [==============================] - 0s 22ms/step - loss: 0.3859 - mask_loss: 0.2438 - feature_loss: 0.0710
Epoch 6/15
4/4 [==============================] - 0s 22ms/step - loss: 0.3769 - mask_loss: 0.2375 - feature_loss: 0.0697
Epoch 7/15
4/4 [==============================] - 0s 22ms/step - loss: 0.3734 - mask_loss: 0.2354 - feature_loss: 0.0690
Epoch 8/15
4/4 [==============================] - 0s 19ms/step - loss: 0.3709 - mask_loss: 0.2342 - feature_loss: 0.0684
Epoch 9/15
4/4 [==============================] - 0s 20ms/step - loss: 0.3684 - mask_loss: 0.2327 - feature_loss: 0.0679
Epoch 10/15
4/4 [==============================] - 0s 20ms/step - loss: 0.3672 - mask_loss: 0.2320 - feature_loss: 0.0676
Epoch 11/15
4/4 [==============================] - 0s 22ms/step - loss: 0.3665 - mask_loss: 0.2316 - feature_loss: 0.0674
Epoch 12/15
4/4 [==============================] - 0s 21ms/step - loss: 0.3663 - mask_loss: 0.2322 - feature_loss: 0.0671
Epoch 13/15
4/4 [==============================] - 0s 22ms/step - loss: 0.3661 - mask_loss: 0.2322 - feature_loss: 0.0670
Epoch 14/15
4/4 [==============================] - 0s 19ms/step - loss: 0.3637 - mask_loss: 0.2307 - feature_loss: 0.0665
Epoch 15/15
4/4 [==============================] - 0s 21ms/step - loss: 0.3634 - mask_loss: 0.2305 - feature_loss: 0.0664
####################################### Trainging MLP using the encoder from EAGER MODE training #########################
Classifier Performance: 0.2783
####################################### Trainging MLP using the encoder from model.fit() training  #########################
Classifier Performance: 0.5447

Who will benefit from this feature?

ML researchers and practitioners.

Contributing

Do you want to contribute a PR? (yes/no): no. Not at this time.

Feb 26 '22 20:02 mortezamg63

I am still waiting for answer. Can anyone tell me what I can do to solve the issue. Thanks

Mar 03 '22 16:03 mortezamg63

@mortezamg63,

I did change in batch_size to 64 both in parameters['batch_size'] and mlp_parameters['batch_size'] and now i can see performance almost same. Please find the gist here for reference.Thanks!

Mar 09 '22 11:03 chunduriv

That's great.. I really appreciate for your answer.

Mar 12 '22 20:03 mortezamg63

@mortezamg63,

Glad the issue is resolved for you, please feel free to move this to closed status. Thanks!

Mar 14 '22 10:03 chunduriv

Are you satisfied with the resolution of your issue? Yes No

Mar 16 '22 13:03 google-ml-butler[bot]

@chunduriv

Thanks for your help. I have another question.

A very simple change causes very different results. So confusing. In the shared code in the first question, I am trying to change the model in eager_ae function. This change is separating the autoencoder model to the encoder model and the decoder model. The result is completely different. the change is shown below:

   # Encoder 
    inputs = contrib_layers.Input(shape=(dim,)) 
    h1 = contrib_layers.Dense(256, activation='relu', name='encoder1')(inputs)  
    h2 = contrib_layers.Dense(128, activation='relu', name='encoder2')(h1)  
    h = contrib_layers.Dense(26, activation='relu', name='encoder3')(h2)  
  
    # Mask estimator
    output_1 = contrib_layers.Dense(dim, activation='sigmoid', name = 'mask')(h)
    # Feature estimator
    output_2 = contrib_layers.Dense(dim, activation='sigmoid', name = 'feature')(h)
    out2 = Model(inputs, [output_1, output_2])

Changing the model to this:

    # Encoder
    h1 = contrib_layers.Dense(256, activation='relu', name='encoder1')(inputs)  
    h2 = contrib_layers.Dense(128, activation='relu', name='encoder2')(h1)  
    h = contrib_layers.Dense(26, activation='relu', name='encoder3')(h2)  
    Encoder = Model(inputs,h)

   # Decoder
    # Mask estimator 
    input_1 = contrib_layers.Input(shape=(latent_dim,))
    output_1 = contrib_layers.Dense(dim, activation='sigmoid', name = 'mask')(input_1)
    # Feature estimator
    output_2 = contrib_layers.Dense(dim, activation='sigmoid', name = 'feature')(input_1)
    Decoder = Model(input_1, [output_1, output_2])

This change shows that two models are completely the same.
After this change, I applied it to the training loop. Losses and other things are the same. Interestingly, the results are completely different. The test accuracies are 93.21% and 91.29% in the first and second models respectively. I tried different batch_sizes in the second defined model, but not change.

At first, I thought it was an incompatibility of different modes. But interestingly, it happens in the same training mode.

Mar 16 '22 15:03 mortezamg63

@mortezamg63 the "contrib" layers that you are using are deprecated and unmaintained. Can you try to use keras.layers.Dense and keras.layers.Input instead and see if that fixes the issue?

May 05 '22 17:05 hertschuh

@mortezamg63 , As mentioned in the above comment, contrib layers are deprecated and remove old tf.contrib.layers and replace them with TF Slim symbols. Also check TF Addons for other tf.contrib symbols.

Please take a look at this doc link which provides more information. Thank you!

Jul 29 '22 16:07 tilakrayal

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

Aug 05 '22 17:08 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further.

Aug 12 '22 18:08 google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No

Aug 12 '22 18:08 google-ml-butler[bot]

keras keras copied to clipboard

Can not get the same performance in the eager mode training

Describe the feature and the current behavior/state.

Who will benefit from this feature?

keras
keras copied to clipboard