recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

index the retrival model using multiple data

Open naarkhoo opened this issue 3 years ago • 3 comments

in the manual there is only

# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
  [tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model))), tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))]
)

which index the Retrival model from dataset. I know the index object has property of index so when I try

index.index(movies.batch(100).map(model.movie_model))

I get the following error

AttributeError: 'MapDataset' object has no attribute 'shape'

which mirrors what is expected in the code here

my input to index which is movies.batch(100).map(model.movie_model) is tensorflow.python.data.ops.dataset_ops.MapDataset and I am using TF 2.8.0 in a colab environment.

In fact my question is how I can index my retrival model using multiple input -> 100 movies users have clicked, 100 very recent movies in the market, 100 movies each users friends have considered ... seems the input must be a list.

naarkhoo avatar May 24 '22 12:05 naarkhoo

Have you tried using the index_from_dataset method?

maciejkula avatar Jun 03 '22 21:06 maciejkula

Thanks, yes and that works, but that means, in a production setup where I have preselected 1000 candidates for each user, I should write them in file,index and rank ?

On Fri, Jun 3, 2022 at 11:07 PM Maciej Kula @.***> wrote:

Have you tried using the index_from_dataset https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/factorized_top_k/BruteForce#index_from_dataset method?

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/recommenders/issues/493#issuecomment-1146363278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWWO34ZLAPVDZS7YXGJ73VNJX2ZANCNFSM5WZMCB5Q . You are receiving this because you authored the thread.Message ID: @.***>

naarkhoo avatar Jun 04 '22 06:06 naarkhoo

@naarkhoo if you have a bounded number of candidates that is different for each user then you don't really want a retrieval index. You would just pass your candidates to your model with query input and do the matrix multiplication. You essentially can skip the retrieval stage and go straight to ranking stage.

patrickorlando avatar Jun 05 '22 12:06 patrickorlando