recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

How to know the Rank model improves the output of the Retrieval model in the case of MovieLens dataset?

Open houghtonweihu opened this issue 2 years ago • 15 comments

We all know that the rank will enhance the output of the retrieval model. But how can we see that in the case of MovieLens dataset?

houghtonweihu avatar Aug 18 '23 18:08 houghtonweihu

What do you mean by "enhance the output"? Do you want to compare the metrics (e.g. top K categorical accuracies) for the combined output of retrieval and ranking to that of the output of retrieval (without ranking)?

rlcauvin avatar Aug 21 '23 15:08 rlcauvin

We can test the improvement of the Rank for a few users. Say, user 42, he has 183 watched movies. We can let the Retrieval to recommend 1000 movies for this user. Then we let the Rank recommend 200 movies from these 1000 movies. Say, the Retrieval has 40 watched movies out of top 200 from the 1000 movies. Can we expect Rank will get more watched movies, say 80, from the 200 movies that it recommends? Thanks for your help!

houghtonweihu avatar Aug 21 '23 20:08 houghtonweihu

We generally expect ranking models to be more predictive. However, it might depend on the metric. Using ROC-AUC as the evaluation metric, you will very likely see better predictions from the ranking model than from the retrieval model alone.

rlcauvin avatar Aug 21 '23 22:08 rlcauvin

Thanks @rlcauvin I agree with you. Is it possible to have a demonstration of using the actual recommended movies as a way to measure the improvement of the Rank from the output of the Retrieval? since we can consider the recommended movies as the final say for the improvement of the Rank (although other metrics also matter).

houghtonweihu avatar Aug 21 '23 23:08 houghtonweihu

The examples in the tutorials such as Multi-task recommenders can clearly show their comparison. But it seems to me the advantage of using Rank + Retrieval is not clearly shown via the actual movie recommendations which we care the most.

houghtonweihu avatar Aug 22 '23 00:08 houghtonweihu

The tutorials split the input data into training and test samples. The test sample includes the "actual" or "ground truth" values.

After building the retrieval and ranking models, generate predictions on the test data as you've described by invoking the retrieval model first, then the ranking model on initial recommendations from the retrieval model. The output gives you the "predicted" values.

You may then compute the top K categorical accuracy and/or ROC-AUC on the actual and predicted values.

rlcauvin avatar Aug 22 '23 13:08 rlcauvin

Thanks @rlcauvin for your advice! I am curious to learn if better evaluation metrics for Retrieval or Rank will translate into better actual movie recommendations? Do we have examples to show this?

houghtonweihu avatar Aug 22 '23 13:08 houghtonweihu

If you do what I described above (compute evaluation metrics for combined retrieval and ranking), as well as the metrics for retrieval alone, you can compare them on the test sample. As for my own content recommendations models, I've found that ranking improves the ROC-AUC but not the top K categorical accuracies.

rlcauvin avatar Aug 22 '23 16:08 rlcauvin

Thanks @rlcauvin For your own content recommendations models, do you see the Rank improves the output of the Retrieval in the actual recommendations (not the metrics for evaluation)?

houghtonweihu avatar Aug 22 '23 16:08 houghtonweihu

I'm gathering that you want sort of a real-world before and after comparison? The positive engagement rate was X when using retrieval for recommendations, and it changed to Y after adding ranking in the recommendations? If that's what you're asking, no, I'm not aware of such an example.

rlcauvin avatar Aug 22 '23 16:08 rlcauvin

@rlcauvin Yes, this is my concern. I try not to have this case:

Say, user 42, he has 183 watched movies. We can let the Retrieval recommend 1000 movies for this user. Then we let the Rank recommend 200 movies from these 1000 movies. Say, the Retrieval has 40 watched movies out of top 200 from the 1000 movies. But Rank will only get 20 watched movies from the 200 movies that it recommends.

houghtonweihu avatar Aug 22 '23 17:08 houghtonweihu

If you want to exclude watched movies from the recommendations, use query_with_exclusions for retrieval, then do the ranking.

rlcauvin avatar Aug 22 '23 17:08 rlcauvin

Thanks @rlcauvin for this tip! Currently, my interest is to see if the recommendations can include the watched movies so we can have the "ground truth" to evaluate the quality of the recommendations.

houghtonweihu avatar Aug 22 '23 17:08 houghtonweihu

Hi @houghtonweihu, @rlcauvin is correct. In the case of the Movielens dataset, you would expect the ranking model to result in better performance on your chosen metrics. Like ROC-AUC, Hitrate@K (top K categorical accuracy), Precision@K, Recall@K.

patrickorlando avatar Sep 04 '23 00:09 patrickorlando

Thank you @rlcauvin and @patrickorlando for advising me to use these metrics to evaluate the Rank. I am curious if the tutorials of Tensorflow Recommders can provide such examples so people can benefit from them?

houghtonweihu avatar Sep 05 '23 20:09 houghtonweihu