recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset?

Open houghtonweihu opened this issue 2 years ago • 2 comments

In the tutorials of Tensorflow Recommenders, top_k_categorical_accuracy is used for the evaluation of Retrieval, and mse for Rank. Do we have examples that show better evaluation metrics translate into better movie recommendations in the case of MovieLens dataset?

houghtonweihu avatar Aug 22 '23 12:08 houghtonweihu

We know that Retrieval is trained with in-batch negative sampling, which is to take other users' positive samples as the current user's negative samples, so this is an approximation of the true negative samples. Rank is trained with mse to predict the ratings. All these metrics are not direct measurement of movie recommendations. But it is the movie recommendations that really matter. I am not sure if there is a possibility for: the training metrics are improving, but the actual movie recommendations are worsening.

houghtonweihu avatar Aug 22 '23 17:08 houghtonweihu