[BUG] Ranking model predict constant

Open PaulSteffen-betclic opened this issue 2 years ago • 0 comments

Bug description

After a training which seems to be ok, the ranking model predict constant.

Steps/Code to reproduce bug

import nvtabular as nvt
import merlin.models.tf as mm
import merlin.io
from merlin.models.tf.transforms.negative_sampling import InBatchNegatives

output_path = "data/processed"
processed_train = nvt.Dataset(f"{output_path}/interactions/train/*.parquet")
processed_valid = nvt.Dataset(f"{output_path}/interactions/valid/*.parquet")

n_per_positive = 12
add_negatives = InBatchNegatives(processed_train.schema, n_per_positive, seed=42, prep_features=True, run_when_testing=True)

train_ranking_loader = Loader(processed_train, schema=schema, batch_size=batch_size, shuffle=True)
valid_ranking_loader = Loader(processed_valid, schema=schema, batch_size=batch_size, shuffle=True)

model = mm.DLRMModel(
    processed_train.schema,
    embedding_dim=64,
    bottom_block=mm.MLPBlock([128, 64]),
    top_block=mm.MLPBlock([64, 128, 512]),
    prediction_tasks=mm.BinaryClassificationTask("Click"),
)

compile_args = {
    "optimizer": tf.keras.optimizers.legacy.Adam(learning_rate=learning_rate),
    "run_eagerly": False,
    "metrics": [mm.RecallAt(10), mm.NDCGAt(10)],
    "weighted_metrics": [tf.keras.metrics.BinaryAccuracy(),tf.keras.metrics.AUC()]
}

model.compile(**compile_args)
model.fit(train_ranking_loader.map(add_negatives),              
          validation_data=valid_ranking_loader.map(add_negatives), 
          class_weights={0: 1, 1: n_per_positive}, 
          epochs=5)

This code produce the following output:

But when I try to predict with this model ranking_scores = model.batch_predict(potential_interactions_loader, batch_size=1024), I have the following warning message:

& the prediction is constant:

I'm asking if it's due to the 2nd warning message during prediction.

N.B: it's not due to potential_interactions_loader because I obtain the same kind of issue trying to predict with valid_ranking_loader.

Expected behavior

Get probability of click, obtained in the past but impossible to reproduce without identified reason.

Environment details

Merlin version: 23.8.0
Platform: macOS
Python version: 3.10.12
Tensorflow version (GPU?): 2.12.0+nv23.6

notebook is run in a container from the following nightly image available here: nvcr.io/nvidia/merlin/merlin-tensorflow:nightly

in which the last version of merlin models is pulled.

Thanks.

Nov 14 '23 21:11 PaulSteffen-betclic