implicit icon indicating copy to clipboard operation
implicit copied to clipboard

Fix `ranking_metrics_at_k()`

Open ita9naiwa opened this issue 3 years ago • 4 comments

This PR resolves a few issues;

  • #412 "precision" on ranking_metrics_at_k is actually "recall"
    I guess it's fine to update precision and recall since this library took major braking update (0.5.0)

  • #545: ranking_metric_at_k raises ValueError if K > num_items

This PR adds MRR, and Precision as new metrics.

ita9naiwa avatar Jul 11 '22 05:07 ita9naiwa

hi @benfred. Can you check and review this PR? this resolves inaccurate NDCG and MRR values of ranking_metrics_at_k function.

ita9naiwa avatar Aug 10 '22 07:08 ita9naiwa

Changes:

  • Precision and Recall metric has been switched.
  • fix MAP metric following
  • Added MRR, since it is also one of the most widely leveraged metrics in RS community e.g., RecSys Challenge 2022.

tr, te = train_test_split(ratings, random_state=1541)
model = AlternatingLeastSquares(random_state=1541, factors=30, iterations=10)
model.fit(tr)
ranking_metrics_at_k(model, tr, te, K=100)

as is :

{'precision': 0.3349958296821056,
 'map': 0.12534890653797998,
 'ndcg': 0.2686550155007732,
 'auc': 0.6093577862786992}

to be:

{'precision': 0.07930221607727832,
 'recall': 0.3349958296820349,
 'map': 0.06293165699220135,
 'ndcg': 0.2686550155007732,
 'auc': 0.6093577862785867,
 'mrr': 0.5348017396151994}

I guess that definition of MAP should follow precision

ita9naiwa avatar Aug 11 '22 04:08 ita9naiwa

@benfred any ETA on getting this in and released? I was debugging a model yesterday that had weird evaluation results and came to the same conclusion as @ita9naiwa.

thomasjungblut avatar Aug 23 '22 09:08 thomasjungblut

Hi @ita9naiwa. I was checking the code for your fix on ranking_metrics_at_k and I'm not sure about the way you define the denominator of Precision. You're using the size of the user's liked items on the test set, but shouldn't it be K, the number of recommended items? K would include True Positives + False Positives, which is what I have normally seen in the definitions I have read of precision. Correct me if I'm wrong, I'd appreciate your opinion on the issue. Thanks! image

malonsocortes avatar Aug 26 '22 13:08 malonsocortes

And the divisor for Recall is also wrong. It should always be divided by likes.size() and not by k if k is smaller. This would only push the score and not return the true recall value. Or am I wrong?

Blo0dR0gue avatar Jul 06 '23 10:07 Blo0dR0gue