implicit icon indicating copy to clipboard operation
implicit copied to clipboard

"precision" on ranking_metrics_at_k is actually "recall"

Open r-yanyo opened this issue 4 years ago • 10 comments

I used ranking_metrics_at_k to evaluate a recommender system.

I set the parameter K(Number of items to test on) progressively larger and I noticed the precision also progressively larger. I finally set the parameter K=10000 and precision became 1.0, It's weird.

https://github.com/benfred/implicit/blob/48457af7ae6117720ce13797bbabf3c96db96a72/implicit/evaluation.pyx#L262

I think it must be pr_div += K (but I'm not sure this is correct because I didn't test it in local environment).

r-yanyo avatar Oct 03 '20 09:10 r-yanyo

Yes, it is Recall indeed. Recall for Recsys usually use the denominant of 'min(k, # test items)'.

ita9naiwa avatar Oct 04 '20 10:10 ita9naiwa

You mean there is no need to change anything, right?

I think it is better to return "recall" instead of "precision".

(edit) I meant "rename" rather than "return".

r-yanyo avatar Oct 04 '20 16:10 r-yanyo

I think it will be better rename "precision" to "recall"(existing one) and add a new "recall"(right one I mentioned earlier) because It is quite cheap to add another ranking metric to the existing implementation.

However, backward compatibility is quite important, since there are already lots of performance benchmarks with that implementation...

ita9naiwa avatar Oct 05 '20 07:10 ita9naiwa

The MAP(Mean Average Precision)@K is also MAR(Mean Average Recall)@K, which is also quite confusing. Hope this will be solved in the future.

wwwbbb8510 avatar Oct 28 '20 01:10 wwwbbb8510

If the backward compatibility is a concern, I might be good to get another function called something like corrected_ranking_metrics_at_k and suggest new users to use this new function.

wwwbbb8510 avatar Oct 28 '20 01:10 wwwbbb8510

related #426

r-yanyo avatar Nov 09 '20 05:11 r-yanyo

Now, I frequently use ranking_metrics_at_k, and I clearly understood fmin(K, likes.size()).

If K < likes.size(), it behaves precision. If K > likes.size(), it behaves recall.

I said before,

I think it is better to rename "recall" instead of "precision".

It's wrong, sorry. However, this is still be discussable problem.

r-yanyo avatar Dec 02 '20 12:12 r-yanyo

By the way,

I think MAP is wrong.

https://github.com/benfred/implicit/blob/80f0e4d372c9808f39e8d72af4aa8403a119e2a3/implicit/evaluation.pyx#L286

I think it should be mean_ap += ap / hit ( need to avoid zero division).

r-yanyo avatar Dec 02 '20 12:12 r-yanyo

Now, I frequently use ranking_metrics_at_k, and I clearly understood fmin(K, likes.size()).

If K < likes.size(), it behaves precision. If K > likes.size(), it behaves recall.

I said before,

I think it is better to rename "recall" instead of "precision".

It's wrong, sorry. However, this is still be discussable problem.

Thanks for the explanation! But this is still a bit difficult to understand.

What does likes.size() represent here? Is it the number of items a user has liked in the test set (i.e. all non-zero indices in the matrix for one user)?

The reason I'm asking is because I want to calculate the average precision at K for all users and I don't think the implemented metric for precision at K returns that now.

ltsaprounis avatar Jan 26 '21 22:01 ltsaprounis

Is it the number of items a user has liked in the test set (i.e. all non-zero indices in the matrix for one user)?

Yes.

This is the repository to implement my custom metrics https://github.com/r-yanyo/implicit/commits/master. I'm actually using this custom metrics for my research about recommender system. In my opinion, precision and recall is complex in recommender system, so I'm using ranking metrics such as MAP and nDCG.

r-yanyo avatar Jan 27 '21 05:01 r-yanyo