recommenders
recommenders copied to clipboard
[Question] Item-to-Item recommendations with the candidate model
Hi !
On the retrieval tutorial page, item-to-item recommendations are mentionned at the end of the page. The paragraph suggests to use items for both towers and then train model with pair items.
I was wondering in the case where we don't have those kind of data still available, is it possible to do item-to-item recommendations with a query tower trained with users by just using the trained candidate tower in the BruteForce (see code below) ? I experienced this and results seem correct.
index_items = tfrs.layers.factorized_top_k.BruteForce(model.candidate_model, 200)
index_items.index_from_dataset(
items.batch(100).map(lambda x: (x["id_house"], model.candidate_model({
"id_house": x["id_house"],
"type_logement": x["type_logement"],
"confort": x["confort"],
"capacity": x["capacity"] }))))
I assume I can get recommended items from different categories since different items could be of the same interest for same users and then be close in the embedding. To filter results, I simply put some filters behind the results according to the item alongside recommendations are displayed. This is maybe not the most effecient or cleanest way to proceed, but this seem to work as a simple solution.
So, my questions are what is your opinion about this approach ? Should the query and candidate towers always be used together as they are trained together ?
Thanks in advance for your time ! Jérémy
Yes, this seems reasonable.
Because two-tower models use dot product (and a softmax loss) to connect the two towers, embeddings for both items and queries are semantically meaningful. Similar items will usually be close in the embedding space, making this kind of application possible.
Thanks @maciejkula
I'm also considering using a similar structure. For item-to-item recomendation. but when using a single embedding layer, the loss does not decrease much. When build two towers, the losses decrease. What could be the problem here
Hey @mustfkeskin
A very uninformed guess would be that query and candidate embeddings require different scale (vector norm).
Particularly if you are using thecandidate_sampling_probability parameter for sampling bias correction.
In order to reproduce popularity biases for frequent items, those items must generally produce a larger score than less popular items. However, when these items are used as a query, they should not scale the score for all other items. Essentially it's not symmetric.
You could train the model with two embedding layers, and build the item-to-item index using only the candidate embeddings. Alternatively, I would anticipate that a linear transformation on the embeddings would resolve the problem if the cause is what I described above.
Some pseudo-ish code
model_dim = 64
embedding_layer = tf.keras.layers.Embedding(input_dim, model_dim)
query_model = tf.keras.Sequential([embedding_layer])
candidate_model = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.Dense(model_dim, activation=None)
])
If you do get round to solving it, let me know if this is the case or if I'm rambling nonsense :joy:
Here my implementation Embedding layer TF adapt module too slow so i assign vocab manually
# create vocabulary
vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
vocabulary.set_vocabulary(embedding_df["id"].values)
# embedding table size
num_tokens = vocabulary.vocabulary_size()
embeddings_initializer = 'uniform'
embedding_dim = vector_size
embedding_layer = tf.keras.Sequential([vocabulary,
tf.keras.layers.Embedding(
num_tokens,
embedding_dim,
embeddings_initializer=embeddings_initializer
)
])
Here my single tower 😄
class Item2ItemModel(tfrs.Model):
def __init__(self, sku_model, task):
super().__init__()
self.sku_model: tf.keras.Model = sku_model
self.task: tf.keras.layers.Layer = task
def compute_loss(self, features: Dict[Text, tf.Tensor], training=True) -> tf.Tensor:
sku_embeddings = self.sku_model(features["query_sku"])
positive_sku_embeddings = self.sku_model(features["candidate_sku"])
return self.task(sku_embeddings, positive_sku_embeddings, compute_metrics=not training)
How can we use the candidate sampling probability for two towers? For both tower. Because in my case both tower embeddings are shared 👀
Use the candidate sampling probability in the same way, calculated over the target distribution, don't worry about the query being also being an item.
class Item2ItemModel(tfrs.Model):
def __init__(self, sku_model, task):
super().__init__()
self.sku_model: tf.keras.Model = sku_model
self.task: tf.keras.layers.Layer = task
def compute_loss(self, features: Dict[Text, tf.Tensor], training=True) -> tf.Tensor:
sku_embeddings = self.sku_model(features["query_sku"])
positive_sku_embeddings = self.sku_model(features["candidate_sku"])
return self.task(
sku_embeddings,
positive_sku_embeddings,
candidate_sampling_probability=features["candidate_sampling_probability"],
compute_metrics=not training
)
Popular products generally sell more, which automatically affects hitrate calculations. How do I boost popular products? candidate_sampling_probability does the opposite as I understand it. When we evulate in production with A/B test, CR(conversion rates) becomes primary metric.
Is the candidate sampling probability calculation correct?
df["candidate_count"] = df.groupby("candidate_sku").transform("count")
df["candidate_sampling_probabilities"] = df["candidate_count"] / df.shape[0]
I updated the method as follows, going from the frequency of pairs crossing with each other.
self.task(sku_embeddings,
positive_sku_embeddings,
compute_metrics=not training,
candidate_sampling_probability=features['candidate_sampling_probabilities'],
sample_weight=features['weight'])
Base model: 0.038 candidate_sampling_probabilities: 0.144 candidate_sampling_probabilities + sample_weight = 0.242
Since sample weight increases the loss value of certain samples, it has had a great impact on learning popular products.
What is your opinion on this topic? @patrickorlando
@mustfkeskin
How do I boost popular products?
candidate_sampling_probabilitydoes the opposite as I understand it.
You are mistaken here, it does boost the scores for popular items. See https://github.com/tensorflow/recommenders/issues/440
df["candidate_count"] = df.groupby("candidate_sku").transform("count") df["candidate_sampling_probabilities"] = df["candidate_count"] / df.shape[0]
This looks correct to me.
Sample weight is a useful parameter to experiment with. In general your popular items should already be receiving far more updates than unpopular items, because they will appear more often in your dataset. So in my opinion you wouldn't need to increase their sample weight. However you are best to evaluate your model. If the resulting models performance is better, (and it isn't simply predicting popular items all the time), then that's fine.
For example if your top 10 items are the target for 25% of your dataset, then simply recommending those 10 items will give you a top_10_accuracy of 0.25, in which case your model is not performing well for your dataset.
This is Popular items are given a boost at inference time by giving them a penalty during training. how you explained it in the other issue. I misunderstood candidate sampling probability, it's better now
Which metric would you suggest I look at when evulating the model? because it has a long tail problem
TopK Catagorical Accuracy along with MRR or NDCG are all valid metrics, but understanding the scores for a popular model baseline is important. Additionally you might want to analyse these metrics for different groups of items, (popular head, middle tail, long tail.) Evaluating recommender systems is a challenging task and it's very domain and application specific.
thanks for advise @patrickorlando