tf-keras icon indicating copy to clipboard operation
tf-keras copied to clipboard

Embedding layer after a Dense layer raises UserWarning: Gradients do not exist

Open antipisa opened this issue 9 months ago • 9 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Red Hat 7
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 2.17
  • Python version: 3.11.8
  • GPU model and memory:
  • Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the problem.

Adding an Embedding layer after a Dense layer does not work, or rather raises a UserWarning during model training that gradients do not exist.


import tensorflow as tf
from tensorflow import keras
from keras.layers import Dense, Embedding, Flatten, Input, Rescaling, concatenate
from keras.models import Model


num_indices = 10
embedding_dim = 5
p = 5

input_layer = Input(shape=(p,), name='input_layer')
embedding_branch = Dense(num_indices, activation='tanh')(input_layer)
embedding_branch = Rescaling(5, 5, name='rescaling')(embedding_branch)
embedding_branch = Embedding(input_dim = num_indices, output_dim = embedding_dim, name='embedding')(embedding_branch)

dense_branch = Dense(20, activation='relu')(input_layer)
dense_branch = Dense(3, activation=relu')(dense_branch)

final_layer = concatenate([ dense_branch, embedding_branch ], name='concatenate_layer')

final_out = Dense(1, activation='linear')(final_layer)
model = Model(inputs=input_layer, outputs=final_out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
model.build(input_shape=(None, p))

X = np.random.randn(100, p)
y = np.random.randn(100)

history = model.fit(X, y, epochs=30, verbose=0)



python3.11/site-packages/keras/src/optimizers/base_optimizer.py: UserWarning: Gradients do not exist for variables ['kernel', 'bias'] when minimizing the loss. If using `model.compile()`, did you forget to provide a `loss` argument?

antipisa avatar Jan 30 '25 23:01 antipisa

@tilakrayal, I suspect it is possible this may not be a bug.

... but I think the error handling message associated with this situation could be improved for clarity, which is the real fix as far as revisions to TF go.

@antipisa The overall strategy looks great, but there appear to be 2 main problems.

  1. I think the main issue here (and the one I think is triggering the error message you are referencing) may be that the output of a Dense layer is being passed to the input of an embedding.
  • This is not intrinsically incorrect, but the output of the Dense layer must be discretized and cast as an int for it to make sense to an embedding layer.
  • Have you attempted to use tf.cast to cast the output of the Dense layer and rescaling as an int, which may look something like this (see below)?
  1. The second problem is that the output of the embedding needs flattened to fit the next Dense layer's input. Once you fix the first error you may be greeted by a dimensionality mismatch related error if this isn't also fixed.
  • Embeddings return a rank 3 tensor (batch_size, input_length, output_dim), This needs flattened to a rank 2 tensor (batch_size, input_length * output_dim) before it will fit the input of the next Dense layer ...


## These lines and everything above remain the same
input_layer = Input(shape=(p,), name='input_layer')
embedding_branch = Dense(num_indices, activation='tanh')(input_layer)

# Fix problem 1: <------------------------------------------<<<<
# Use a Lambda layer to cast the output to an integer, and clip the values to the valid range 
embedding_branch = Lambda(lambda x: tf.clip_by_value(tf.cast(x, tf.int32), 0, num_indices-1))(embedding_branch) 

# Now you have an integer input for the Embedding ... 
embedding_branch = Embedding(input_dim=num_indices, output_dim=embedding_dim, name='embedding')(embedding_branch)

# Fix problem 2: <------------------------------------------<<<<
embedding_branch = Flatten()(embedding_branch) # Flatten (batch_size, input_length, output_dim) to (batch_size, input_length * output_dim)

I remember running into a similar problem a few years ago when I was trying to discretize then embed a continuous value ...

I think the correct classification of the issue is error-handling. I think the real issue that needs resolved is that the error message raised in this situation could be clearer and explain why this is a problem, not just "Gradient not available ...".

I think problem 1 this is a common pitfall that people fall in, and better communication of the error would prevent a lot of future duplicate issue submissions over the same intrinsic issue.

david-thrower avatar Apr 09 '25 06:04 david-thrower

@antipisa, I tried to execute the code with the latest Keras version and observed the concatination error. The keras.layers.Concatenate takes input a list of tensors, all of the same shape except for the concatenation axis. By default axis will be -1 for the concatenation layer so except for the last dimension please make sure that the remaining dimensions shape are equal.

Kindly find the gist of it here. Thank You.

tilakrayal avatar Apr 17 '25 08:04 tilakrayal

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar May 02 '25 02:05 github-actions[bot]

Hi @tilakrayal and @david-thrower , even with the fix mentioned above, the gradients error persists :

import tensorflow as tf
import keras
from keras.layers import Dense, Embedding, Flatten, Input, Rescaling, concatenate
from keras.models import Model


num_indices = 10
embedding_dim = 5
p = 5

input_layer = Input(shape=(p,), name='input_layer')
embedding_branch = Dense(num_indices, activation='tanh')(input_layer)
# need to rescale output to interval (0, 10) before integer cast
embedding_branch = Rescaling(5, 5, name='rescaling')(embedding_branch)
embedding_branch = Lambda(lambda x: tf.clip_by_value(tf.cast(x, tf.int32), 0, num_indices-1))(embedding_branch) 
embedding_branch = Embedding(input_dim = num_indices, output_dim = embedding_dim, name='embedding')(embedding_branch)
embedding_branch = Flatten()(embedding_branch) # Flatten (batch_size, input_length, output_dim) to (batch_size, input_length * output_dim)
embedding_branch = Dense(3, activation='relu')(embedding_branch )

dense_branch = Dense(20, activation='relu')(input_layer)
dense_branch = Dense(3, activation='relu')(dense_branch)

final_layer = concatenate([ dense_branch, embedding_branch ], name='concatenate_layer')

final_out = Dense(1, activation='linear')(final_layer)
model = Model(inputs=input_layer, outputs=final_out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
model.build(input_shape=(None, p))

X = np.random.randn(100, p)
y = np.random.randn(100)
history = model.fit(X, y, epochs=30, verbose=0)

python3.11/site-packages/keras/src/optimizers/base_optimizer.py: UserWarning: Gradients do not exist for variables ['kernel', 'bias'] when minimizing the loss. If using `model.compile()`, did you forget to provide a `loss` argument?


antipisa avatar May 06 '25 00:05 antipisa

There is also the additional problem that if you save this model and then load and try to predict, the Lambda causes a problem because input and output shapes are not defined.

antipisa avatar May 07 '25 00:05 antipisa

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar May 21 '25 02:05 github-actions[bot]

@tilakrayal

antipisa avatar May 21 '25 02:05 antipisa

@tilakrayal @david-thrower checking if you can repro with latest code

antipisa avatar Jun 30 '25 14:06 antipisa

Ok, I think the problem is that the Lambda layer which casts to integer is non-differentiable.

antipisa avatar Jul 01 '25 14:07 antipisa