models
models copied to clipboard
[BUG] Gradients do not exist for categorical variables in the advanced example notebook
Bug description
When running the advanced example notebook, the fit method is printing the following warning
WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
Steps/Code to reproduce bug
Run the following example notebook
Expected behavior
Create trainable embeddings variables with gradients.
Additional context
It might be useful to tackle this issue to log the gradients to Tensorboard.
Than can be done with Merlin Models by adding these line in Model.train_step()
(like here ) ...
for v, g in zip(self.trainable_variables, gradients):
tf.summary.histogram(f"gradients/{v.name}", g, step=self.optimizer.iterations)
... and passing TensorBoard
callback to model.fit()
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=logdir,
update_freq=10,
)
@sararb any updates here?
Here is a summary of a debug session I have just run:
-
If we build the model without checking the outputs of intermediate layers (meaning disabling the notebooks cells that explicitly apply a given block on the sampled
batch
inputs) are running without the warning and gradients are correctly set. -
The warning raises when we check the outputs of the intermediates block; explicitly calling the EmbeddingFeature block builds the embeddings variables with names that are different from the names set when compiling the final model. If we take the example of the userId column:
-
The name of the variable when executing the embedding layer alone : 'embedding_features/userId:0'
-
The name of the variable within the model:
tf.Variable 'model/sequential_block_10/userId/embedding:0
-
The variables contained in the EmbeddingFreature block from the first execution are then not connected to the final model during fit()
step, which raises the warning
WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
I am thinking about three possible fixes (@marcromeyn curious to have your thoughts?):
- Add a cell to the notebook that re-defines the blocks and build the end-to-end model, something like that:
deep_continuous_block = continuous_block.connect(mm.MLPBlock([64]))
embedding_block = mm.EmbeddingFeatures.from_schema(sub_schema)
dlrm_input_block = mm.ParallelBlock(
{"embeddings": embedding_block, "deep_continuous": deep_continuous_block}
)
dlrm_interaction = dlrm_input_block.connect_with_shortcut(
DotProductInteractionBlock(), shortcut_filter=mm.Filter("deep_continuous"), aggregation="concat"
)
deep_dlrm_interaction = dlrm_interaction.connect(mm.MLPBlock([64, 128, 512]))
binary_task = mm.BinaryClassificationTask(
sub_schema,
metrics=[tf.keras.metrics.AUC],
pre=LogitsTemperatureScaler(temperature=2),
)
model = deep_dlrm_interaction.connect(binary_task)
-
Can we re-init/delete the variables of a custom block after it was built?
-
Could we set generic variables names in the EmbeddingsFeatures so we could have the same name in both executions: a quick test of the block's output or within the model
call
?