recommenders How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset?

How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset?

Open houghtonweihu opened this issue 2 years ago • 2 comments

In the tutorials of Tensorflow Recommenders, top_k_categorical_accuracy is used for the evaluation of Retrieval, and mse for Rank. Do we have examples that show better evaluation metrics translate into better movie recommendations in the case of MovieLens dataset?

Aug 22 '23 12:08 houghtonweihu

We know that Retrieval is trained with in-batch negative sampling, which is to take other users' positive samples as the current user's negative samples, so this is an approximation of the true negative samples. Rank is trained with mse to predict the ratings. All these metrics are not direct measurement of movie recommendations. But it is the movie recommendations that really matter. I am not sure if there is a possibility for: the training metrics are improving, but the actual movie recommendations are worsening.

Aug 22 '23 17:08 houghtonweihu

recommenders recommenders copied to clipboard

How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset?

recommenders
recommenders copied to clipboard