recommenders Getting Less Top-k accuracy comparing to other Open source Recommendation systems

We have done a comparison of TFRS with Lightfm (a open source library which is for recommendation system) and Result shows that Lightfm performs better than TFRS. Here we use the dataset which is available in amazon blog for evaluating recommendation system. The result is showed as follows. Is there any way we can improve the accuracy further?

Screenshot from 2022-08-24 10-31-18

Here i am attaching Train- Test Top-k visualisations

sample code

class UserModel(tf.keras.Model):

def __init__(self):
    super().__init__()

    emb_dim = 32    
    self.user_id_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=USER_ID_unique, mask_token=None),
        tf.keras.layers.Embedding(len(USER_ID_unique) + 1, emb_dim),
    ])
        
    self.cabin_type_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary= CABIN_TYPE_unique, mask_token=None),  
        tf.keras.layers.Embedding(len(CABIN_TYPE_unique) + 1, emb_dim),
    ])

    self.user_residence_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=USER_RESIDENCE_unique, mask_token=None),
        tf.keras.layers.Embedding(len(USER_RESIDENCE_unique) + 1, emb_dim),
    ])
    

def call(self, user_interation_data):
    return tf.concat([                          
        self.user_id_embedding(user_interation_data["USER_ID"]), 
        self.cabin_type_embedding(user_interation_data["CABIN_TYPE"]), 
        self.user_residence_embedding(user_interation_data["USER_RESIDENCE"]),
    ], axis=1)

class ItemModel(tf.keras.Model):

def __init__(self):
    super().__init__()

    

    self.item_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=item_unique, mask_token=None),
        tf.keras.layers.Embedding(len(item_unique) + 1, 32),
    ])


def call(self, user_interation_data):

    return tf.concat([
        self.item_embedding(user_interation_data["ITEM_ID"])
        
        ], axis=1)

class TRFSRetrievalModel(tfrs.models.Model):

def __init__(self, UserModel,ItemModel, item_ds ):
    super().__init__()

    self.query_model = tf.keras.Sequential([#,UserModel()
      UserModel(),
      tf.keras.layers.Dense(32 , kernel_initializer= tf.keras.initializers.RandomNormal(seed=99)),   
    ])
    

    self.candidate_model = tf.keras.Sequential([
      ItemModel(),
      tf.keras.layers.Dense(32, kernel_initializer= tf.keras.initializers.RandomNormal(seed=1))
    ]) 
    
    
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
        item_ds.map(self.candidate_model),
            ks= (3, 5, 10,15, 25))
    )
    
def compute_loss(self, features, training= True):

    item_features = {"ITEM_ID":features.pop("ITEM_ID") }
    query_embeddings = self.query_model(features)
    item_embeddings = self.candidate_model(item_features)

    return self.task(query_embeddings, 
    item_embeddings, 
    compute_metrics=True
    )

Aug 24 '22 05:08 drtinumohan

Have you tried the WARP loss for TFRS？

WARP loss is implemented in lightfm, and I think that's probably the main reason driving the difference here.

Aug 24 '22 23:08 jasonzyx

@drtinumohan sorry to jump in on your thread!

@jasonzyx do you have a reference implementation for WARP loss in TF?

Aug 27 '22 10:08 ydennisy

@ydennisy I found this: https://gist.github.com/vihari/c3c59bf2e4f18722a872499b0394986c

First define the warp loss as above. Then specify them like below

self.task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(loss=warp_loss, ...)

Aug 27 '22 18:08 jasonzyx

With the category cross-entropy loss, have you tried to tune the parameter num_hard_negatives -- it shares somewhat similar idea with WARP loss. This may help boost the performance to the same level as WARP

self.task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
            candidates=products.batch(256).map(self.product_model)
          ),
          num_hard_negatives=100,
        )

Aug 27 '22 18:08 jasonzyx

@jasonzyx thank you! I played around with this parameter, but was unable to gain any improvement.

Could you maybe point to a reference which could explain the intuition for num_hard_negatives?

Aug 28 '22 09:08 ydennisy

@ydennisy @jasonzyx Me too tried with num_hard_negatives, But no improvement in the result. As far as i know WRAP loss and TOP_k -category cross-entropy loss which is used in TFRS are working in similar fashion. I am adding my github repo link for your reference github link.

Aug 28 '22 15:08 drtinumohan

Lots of good comments above - thank you!

One further thing you could try out is to correct for negative sampling probability. Unlike LightFM (which samples negatives from the candidate corpus in a close-to-uniform fashion), TFRS uses in-batch negatives. This means that popular items are over-represented as negatives, and can be penalized in retrieval leading to lower top-k metrics.

The simplest way to do this is to pre-compute the probability of candidates occurring in your dataset, then pass it as the candidate_sampling_probability argument to the Retrieval task.

Sep 27 '22 21:09 maciejkula

@maciejkula You're a lifesaver. Thank you so much for your solution. i tried it out and the result was awesome. Top-k accuracy is increasedsignificantly. Calculated the sample probability using this formula num_times_canidate_appears_in_interactions / num_all_interactions

Sep 28 '22 23:09 drtinumohan

@maciejkula Other than candidate_sampling_probability there is an other parameter sample_weight. I play around this sample_weight and got some improvement in the accuracy . it would be great if you can explain the difference between these two parameter? . i presume both are different but convey the same idea

Sep 29 '22 12:09 drtinumohan

@drtinumohan Curious how did you setup the sample_weight?

Oct 03 '22 21:10 jasonzyx

@jasonzyx please refer the official document of sklearn here

Oct 04 '22 02:10 drtinumohan

@drtinumohan these two are unrelated.

candidate_sampling_probability compensates for the bias introduced by the in-batch negative sampling strategy, where popular items are used as negatives too often.
sample_weight is about deciding which samples you want the model to focus on. A very small sample weight will make the model ignore a given training examples; very large weight will make the model focus on that sample to the exclusion of other data.

Oct 19 '22 23:10 maciejkula

@maciejkula Any references about results comparing different negative sampling strategies --> i.e. in-batch, cross-batch, random ... etc

As far as I understand the sample_weight can be used to model recency of data right? so if i need to focus more on recent data i can increase their weight somehow?

Jan 26 '23 08:01 OmarMAmin

@drtinumohan Can you please explain a bit more about candidate_sampling_probability?

Is it an array of probabilities or a dictionary mapping item IDs to their frequency in data?

Also, you mentioned you calculated it as the num_times_canidate_appears_in_interactions / num_all_interactions which is equivalent to the popularity of each candidate item. So, does that mean popular items will be more likely to be chosen as negative samples or less likely? Based on what you described, I assume they will be chosen more often?

Mar 05 '23 23:03 abdollahpouri

@maciejkula @drtinumohan What kind of optimizer is usually used for recommendation systems? Adagrad? Or Adam?

Jul 25 '23 09:07 evanchenhi

recommenders recommenders copied to clipboard

Getting Less Top-k accuracy comparing to other Open source Recommendation systems

recommenders
recommenders copied to clipboard