recommenders
recommenders copied to clipboard
How to know the Rank model improves the output of the Retrieval model in the case of MovieLens dataset?
We all know that the rank will enhance the output of the retrieval model. But how can we see that in the case of MovieLens dataset?
What do you mean by "enhance the output"? Do you want to compare the metrics (e.g. top K categorical accuracies) for the combined output of retrieval and ranking to that of the output of retrieval (without ranking)?
We can test the improvement of the Rank for a few users. Say, user 42, he has 183 watched movies. We can let the Retrieval to recommend 1000 movies for this user. Then we let the Rank recommend 200 movies from these 1000 movies. Say, the Retrieval has 40 watched movies out of top 200 from the 1000 movies. Can we expect Rank will get more watched movies, say 80, from the 200 movies that it recommends? Thanks for your help!
We generally expect ranking models to be more predictive. However, it might depend on the metric. Using ROC-AUC as the evaluation metric, you will very likely see better predictions from the ranking model than from the retrieval model alone.
Thanks @rlcauvin I agree with you. Is it possible to have a demonstration of using the actual recommended movies as a way to measure the improvement of the Rank from the output of the Retrieval? since we can consider the recommended movies as the final say for the improvement of the Rank (although other metrics also matter).
The examples in the tutorials such as Multi-task recommenders can clearly show their comparison. But it seems to me the advantage of using Rank + Retrieval is not clearly shown via the actual movie recommendations which we care the most.
The tutorials split the input data into training and test samples. The test sample includes the "actual" or "ground truth" values.
After building the retrieval and ranking models, generate predictions on the test data as you've described by invoking the retrieval model first, then the ranking model on initial recommendations from the retrieval model. The output gives you the "predicted" values.
You may then compute the top K categorical accuracy and/or ROC-AUC on the actual and predicted values.
Thanks @rlcauvin for your advice! I am curious to learn if better evaluation metrics for Retrieval or Rank will translate into better actual movie recommendations? Do we have examples to show this?
If you do what I described above (compute evaluation metrics for combined retrieval and ranking), as well as the metrics for retrieval alone, you can compare them on the test sample. As for my own content recommendations models, I've found that ranking improves the ROC-AUC but not the top K categorical accuracies.
Thanks @rlcauvin For your own content recommendations models, do you see the Rank improves the output of the Retrieval in the actual recommendations (not the metrics for evaluation)?
I'm gathering that you want sort of a real-world before and after comparison? The positive engagement rate was X when using retrieval for recommendations, and it changed to Y after adding ranking in the recommendations? If that's what you're asking, no, I'm not aware of such an example.
@rlcauvin Yes, this is my concern. I try not to have this case:
Say, user 42, he has 183 watched movies. We can let the Retrieval recommend 1000 movies for this user. Then we let the Rank recommend 200 movies from these 1000 movies. Say, the Retrieval has 40 watched movies out of top 200 from the 1000 movies. But Rank will only get 20 watched movies from the 200 movies that it recommends.
If you want to exclude watched movies from the recommendations, use query_with_exclusions for retrieval, then do the ranking.
Thanks @rlcauvin for this tip! Currently, my interest is to see if the recommendations can include the watched movies so we can have the "ground truth" to evaluate the quality of the recommendations.
Hi @houghtonweihu, @rlcauvin is correct. In the case of the Movielens dataset, you would expect the ranking model to result in better performance on your chosen metrics. Like ROC-AUC, Hitrate@K (top K categorical accuracy), Precision@K, Recall@K.
Thank you @rlcauvin and @patrickorlando for advising me to use these metrics to evaluate the Rank. I am curious if the tutorials of Tensorflow Recommders can provide such examples so people can benefit from them?