recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

index_from_dataset returns indices rather than movie names

Open abdollahpouri opened this issue 2 years ago • 3 comments

Hello

In my code I added another feature to the candidate tower in addition to movie title (for each movie I have a vector representation of that movie which is precalculated using some other algorithms) and just feed it directly to the candidate tower.

interactions_dict = a dictionary

ratings = tf.data.Dataset.from_tensor_slices(interactions_dict)

movies = ratings.map(lambda x: {
    'movie_title' : x['movie_title'], 
    'movie_vector' : x['movie_vector'],

})
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model,k=CANDIDATE_POOL_SIZE)
index.index_from_dataset(movies.batch(100).map(lambda x: model.candidate_model(x)))
    query_dict = {'user_id':tf.constant([user]),
              'user_vector':np.stack([user_vector])}

Any idea why title after running _, titles = index(query_dict) contains indices rather than the actual movie names?

Here is the call method in my candidate tower:


  def call(self, titles):
    return tf.concat([
        self.title_embedding(titles["movie_title"]),
        titles["movie_vector"]
    ], axis=1)

abdollahpouri avatar Mar 03 '23 15:03 abdollahpouri

hi, @abdollahpouri . In order for index_from_dataset to return the movie name instead of the index, we need to enter a tuple of (candidate identifier, candidate embedding) .

https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/factorized_top_k/BruteForce

It should look like this:

index.index_from_dataset(
  movies.batch(100).map(lambda x: (x['movie_title'], model.candidate_model(x)))
)

fuchami avatar Mar 06 '23 06:03 fuchami

Hi, @abdollahpouri ,

How can you add the movie vector to the dataset? I would like to add a vector to the user model as well. Please advice, thank you.

zhifeng-huang avatar Mar 10 '23 04:03 zhifeng-huang

@zhifeng-huang You can use it as a feature straight feed into your model. If the movie vector is already a pre-calculated feature that can be fed into the model as it is, then you can do the following:

class QueryModel(tf.keras.Model):


  
  def __init__(self,layer_sizes):
    super().__init__()
    self.movie_embedding = tf.keras.Sequential([
        tf.keras.layers.StringLookup(
            vocabulary=unique_movie_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_movie_ids) + 1, 32),
    ])

  def call(self, inputs):
         return tf.concat([
        inputs["movie_vector"],
         self.movie_embedding(inputs['movie_id']),

     ], axis=1)

abdollahpouri avatar Mar 10 '23 16:03 abdollahpouri