models icon indicating copy to clipboard operation
models copied to clipboard

[BUG] Gradients do not exist for categorical variables in the advanced example notebook

Open sararb opened this issue 2 years ago • 4 comments

Bug description

When running the advanced example notebook, the fit method is printing the following warning

WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?

Steps/Code to reproduce bug

Run the following example notebook

Expected behavior

Create trainable embeddings variables with gradients.

Additional context

sararb avatar Apr 12 '22 19:04 sararb

It might be useful to tackle this issue to log the gradients to Tensorboard. Than can be done with Merlin Models by adding these line in Model.train_step() (like here ) ...

for v, g in zip(self.trainable_variables, gradients):
            tf.summary.histogram(f"gradients/{v.name}", g, step=self.optimizer.iterations)

... and passing TensorBoard callback to model.fit()

tensorboard_callback = tf.keras.callbacks.TensorBoard(
            log_dir=logdir,
            update_freq=10,
        )

gabrielspmoreira avatar Apr 12 '22 21:04 gabrielspmoreira

@sararb any updates here?

EvenOldridge avatar May 04 '22 02:05 EvenOldridge

Here is a summary of a debug session I have just run:

  • If we build the model without checking the outputs of intermediate layers (meaning disabling the notebooks cells that explicitly apply a given block on the sampled batch inputs) are running without the warning and gradients are correctly set. image

  • The warning raises when we check the outputs of the intermediates block; explicitly calling the EmbeddingFeature block builds the embeddings variables with names that are different from the names set when compiling the final model. If we take the example of the userId column:

    1. The name of the variable when executing the embedding layer alone : 'embedding_features/userId:0'

    2. The name of the variable within the model: tf.Variable 'model/sequential_block_10/userId/embedding:0

The variables contained in the EmbeddingFreature block from the first execution are then not connected to the final model during fit() step, which raises the warning

WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?

sararb avatar May 09 '22 17:05 sararb

I am thinking about three possible fixes (@marcromeyn curious to have your thoughts?):

  • Add a cell to the notebook that re-defines the blocks and build the end-to-end model, something like that:
deep_continuous_block = continuous_block.connect(mm.MLPBlock([64]))
embedding_block = mm.EmbeddingFeatures.from_schema(sub_schema)
dlrm_input_block = mm.ParallelBlock(
    {"embeddings": embedding_block, "deep_continuous": deep_continuous_block}
)
dlrm_interaction = dlrm_input_block.connect_with_shortcut(
    DotProductInteractionBlock(), shortcut_filter=mm.Filter("deep_continuous"), aggregation="concat"
)
deep_dlrm_interaction = dlrm_interaction.connect(mm.MLPBlock([64, 128, 512]))
binary_task = mm.BinaryClassificationTask(
    sub_schema,
    metrics=[tf.keras.metrics.AUC],
    pre=LogitsTemperatureScaler(temperature=2),
)

model = deep_dlrm_interaction.connect(binary_task)
  • Can we re-init/delete the variables of a custom block after it was built?

  • Could we set generic variables names in the EmbeddingsFeatures so we could have the same name in both executions: a quick test of the block's output or within the model call?

sararb avatar May 09 '22 18:05 sararb