recommenders
recommenders copied to clipboard
candidates argument for FactorizedTopK
Hi,
metrics = tfrs.metrics.FactorizedTopK( candidates=movies.batch(128).map(movie_model) )
I'm trying to figure out how 'candidates' argument works for FactorizedTopK metric from the retrieval tutorial.
The tutorial uses 'movies' dataset, and I found the dataset includes some duplicates.
I tested using an array of unique movies for that argument and I got different accuracy compared to using 'movies' dataset.
Can anyone help me to understand how the candidates are used to calculate accuracy and how I should create this from the dataset I have (order of items and batch size)?
Top K categorical accuracy is the percentage of records for which the (non-zero) targets are in the top K predictions. So, if a user clicked or rated a movie positively, and that movie has the 11th highest score in the model's predictions for that user, then it wouldn't qualify for the top 10 categorical accuracy, but it would qualify for the top 25 categorical accuracy, for example.
Thanks for your answer, @rlcauvin Can I also get your help with understanding why I can't use unique values of movies for 'candidates' argument? I tried using the unique movies for 'candidates' but the top k accuracy got different. I couldn't understand why I can't just use unique items if the 'candidates' is used as implicit negatives.
I use unique candidates in my retrieval models. I suppose specifying candidates with duplicates could result in some of the duplicates appearing more than once in the top K recommendations for a user, or in implicit negatives skewing the model. I haven't examined the MovieLens dataset, but I don't see any good reason that it should contain duplicates in the movies file.