recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

candidates argument for FactorizedTopK

Open datasciyj opened this issue 2 years ago • 3 comments

Hi, metrics = tfrs.metrics.FactorizedTopK( candidates=movies.batch(128).map(movie_model) ) I'm trying to figure out how 'candidates' argument works for FactorizedTopK metric from the retrieval tutorial. The tutorial uses 'movies' dataset, and I found the dataset includes some duplicates. I tested using an array of unique movies for that argument and I got different accuracy compared to using 'movies' dataset. Can anyone help me to understand how the candidates are used to calculate accuracy and how I should create this from the dataset I have (order of items and batch size)?

datasciyj avatar Aug 16 '23 22:08 datasciyj

Top K categorical accuracy is the percentage of records for which the (non-zero) targets are in the top K predictions. So, if a user clicked or rated a movie positively, and that movie has the 11th highest score in the model's predictions for that user, then it wouldn't qualify for the top 10 categorical accuracy, but it would qualify for the top 25 categorical accuracy, for example.

rlcauvin avatar Aug 21 '23 15:08 rlcauvin

Thanks for your answer, @rlcauvin Can I also get your help with understanding why I can't use unique values of movies for 'candidates' argument? I tried using the unique movies for 'candidates' but the top k accuracy got different. I couldn't understand why I can't just use unique items if the 'candidates' is used as implicit negatives.

datasciyj avatar Aug 27 '23 21:08 datasciyj

I use unique candidates in my retrieval models. I suppose specifying candidates with duplicates could result in some of the duplicates appearing more than once in the top K recommendations for a user, or in implicit negatives skewing the model. I haven't examined the MovieLens dataset, but I don't see any good reason that it should contain duplicates in the movies file.

rlcauvin avatar Aug 29 '23 00:08 rlcauvin