recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

How does regularisation affect the training speed? I am getting X100 boost.

Open IgorHoholko opened this issue 2 years ago • 4 comments

Hello!

I am experimenting with the base model from tutorials. I have noted that after adding tf.keras.regularizers.l2(0.05) ) to embeddings layer two things happen:

  1. Metrics start growing - understandable.
  2. Speed of trining becomes X50-100 times faster. Why can it happen?

One more thing I noted is that when I add a few Dense layers without any regularizers or add tf.keras.regularizers.l2(1) with big weight to embedding I am getting all accuracies Top1, Top3, ..., Top100 equal to 1, and loss is low and stable. What is happening in this case?

vocab = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary[slot_name])
embedding = tf.keras.layers.Embedding(vocab.vocabulary_size(), embedding_dim, embeddings_regularizer=tf.keras.regularizers.l2(0.05) )
inputs.append( tf.keras.Sequential([vocab, 
                        embedding, 
                        # tf.keras.layers.Dense(32, activation="relu"),
                        # tf.keras.layers.Dense(32, activation="relu"),
                        ]) )

Image: growing lines on the top - experiments with regularisation. Others - without. top_100_categorical_accuracy image

IgorHoholko avatar Feb 26 '23 20:02 IgorHoholko

Hi @IgorHoholko,

When you say training speed, do you mean time per step or number of steps required for loss to reach a certain level?

For your second point there could be a few issues:

  1. Large L2 regularization causes your embedding weights to go to zero, resulting in numerical instability.
  2. The final dense layer should not have an activation.

patrickorlando avatar Feb 27 '23 23:02 patrickorlando

Thank you @patrickorlando .

Sorry for being not clear. I mean time per step in training loop.

IgorHoholko avatar Feb 28 '23 06:02 IgorHoholko

Then that is quite odd. Are you using multiple GPUs @IgorHoholko?

patrickorlando avatar Feb 28 '23 11:02 patrickorlando

I use one RTX 4090.

IgorHoholko avatar Mar 21 '23 16:03 IgorHoholko