recommenders
recommenders copied to clipboard
Getting Less Top-k accuracy comparing to other Open source Recommendation systems
We have done a comparison of TFRS with Lightfm (a open source library which is for recommendation system) and Result shows that Lightfm performs better than TFRS. Here we use the dataset which is available in amazon blog for evaluating recommendation system. The result is showed as follows. Is there any way we can improve the accuracy further?
Here i am attaching Train- Test Top-k visualisations
sample code
class UserModel(tf.keras.Model):
def __init__(self):
super().__init__()
emb_dim = 32
self.user_id_embedding = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary=USER_ID_unique, mask_token=None),
tf.keras.layers.Embedding(len(USER_ID_unique) + 1, emb_dim),
])
self.cabin_type_embedding = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary= CABIN_TYPE_unique, mask_token=None),
tf.keras.layers.Embedding(len(CABIN_TYPE_unique) + 1, emb_dim),
])
self.user_residence_embedding = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary=USER_RESIDENCE_unique, mask_token=None),
tf.keras.layers.Embedding(len(USER_RESIDENCE_unique) + 1, emb_dim),
])
def call(self, user_interation_data):
return tf.concat([
self.user_id_embedding(user_interation_data["USER_ID"]),
self.cabin_type_embedding(user_interation_data["CABIN_TYPE"]),
self.user_residence_embedding(user_interation_data["USER_RESIDENCE"]),
], axis=1)
class ItemModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.item_embedding = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary=item_unique, mask_token=None),
tf.keras.layers.Embedding(len(item_unique) + 1, 32),
])
def call(self, user_interation_data):
return tf.concat([
self.item_embedding(user_interation_data["ITEM_ID"])
], axis=1)
class TRFSRetrievalModel(tfrs.models.Model):
def __init__(self, UserModel,ItemModel, item_ds ):
super().__init__()
self.query_model = tf.keras.Sequential([#,UserModel()
UserModel(),
tf.keras.layers.Dense(32 , kernel_initializer= tf.keras.initializers.RandomNormal(seed=99)),
])
self.candidate_model = tf.keras.Sequential([
ItemModel(),
tf.keras.layers.Dense(32, kernel_initializer= tf.keras.initializers.RandomNormal(seed=1))
])
self.task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
item_ds.map(self.candidate_model),
ks= (3, 5, 10,15, 25))
)
def compute_loss(self, features, training= True):
item_features = {"ITEM_ID":features.pop("ITEM_ID") }
query_embeddings = self.query_model(features)
item_embeddings = self.candidate_model(item_features)
return self.task(query_embeddings,
item_embeddings,
compute_metrics=True
)
Have you tried the WARP loss for TFRS?
WARP loss is implemented in lightfm, and I think that's probably the main reason driving the difference here.
@drtinumohan sorry to jump in on your thread!
@jasonzyx do you have a reference implementation for WARP loss in TF?
@ydennisy I found this: https://gist.github.com/vihari/c3c59bf2e4f18722a872499b0394986c
First define the warp loss as above. Then specify them like below
self.task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(loss=warp_loss, ...)
With the category cross-entropy loss, have you tried to tune the parameter num_hard_negatives
-- it shares somewhat similar idea with WARP loss. This may help boost the performance to the same level as WARP
self.task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=products.batch(256).map(self.product_model)
),
num_hard_negatives=100,
)
@jasonzyx thank you! I played around with this parameter, but was unable to gain any improvement.
Could you maybe point to a reference which could explain the intuition for num_hard_negatives
?
@ydennisy @jasonzyx Me too tried with num_hard_negatives, But no improvement in the result. As far as i know WRAP loss and TOP_k -category cross-entropy loss which is used in TFRS are working in similar fashion. I am adding my github repo link for your reference github link.
Lots of good comments above - thank you!
One further thing you could try out is to correct for negative sampling probability. Unlike LightFM (which samples negatives from the candidate corpus in a close-to-uniform fashion), TFRS uses in-batch negatives. This means that popular items are over-represented as negatives, and can be penalized in retrieval leading to lower top-k metrics.
The simplest way to do this is to pre-compute the probability of candidates occurring in your dataset, then pass it as the candidate_sampling_probability
argument to the Retrieval
task.
@maciejkula
You're a lifesaver. Thank you so much for your solution. i tried it out and the result was awesome. Top-k accuracy is increasedsignificantly. Calculated the sample probability using this formula num_times_canidate_appears_in_interactions / num_all_interactions
@maciejkula Other than candidate_sampling_probability there is an other parameter sample_weight. I play around this sample_weight and got some improvement in the accuracy . it would be great if you can explain the difference between these two parameter? . i presume both are different but convey the same idea
@drtinumohan Curious how did you setup the sample_weight?
@jasonzyx please refer the official document of sklearn here
@drtinumohan these two are unrelated.
-
candidate_sampling_probability
compensates for the bias introduced by the in-batch negative sampling strategy, where popular items are used as negatives too often. -
sample_weight
is about deciding which samples you want the model to focus on. A very small sample weight will make the model ignore a given training examples; very large weight will make the model focus on that sample to the exclusion of other data.
@maciejkula Any references about results comparing different negative sampling strategies --> i.e. in-batch, cross-batch, random ... etc
As far as I understand the sample_weight
can be used to model recency of data right? so if i need to focus more on recent data i can increase their weight somehow?
@drtinumohan
Can you please explain a bit more about candidate_sampling_probability
?
Is it an array of probabilities or a dictionary mapping item IDs to their frequency in data?
Also, you mentioned you calculated it as the num_times_canidate_appears_in_interactions / num_all_interactions
which is equivalent to the popularity of each candidate item. So, does that mean popular items will be more likely to be chosen as negative samples or less likely? Based on what you described, I assume they will be chosen more often?
@maciejkula @drtinumohan What kind of optimizer is usually used for recommendation systems? Adagrad? Or Adam?