implicit
implicit copied to clipboard
"precision" on ranking_metrics_at_k is actually "recall"
I used ranking_metrics_at_k
to evaluate a recommender system.
I set the parameter K(Number of items to test on) progressively larger and I noticed the precision also progressively larger. I finally set the parameter K=10000 and precision became 1.0, It's weird.
https://github.com/benfred/implicit/blob/48457af7ae6117720ce13797bbabf3c96db96a72/implicit/evaluation.pyx#L262
I think it must be pr_div += K
(but I'm not sure this is correct because I didn't test it in local environment).
Yes, it is Recall indeed. Recall for Recsys usually use the denominant of 'min(k, # test items)'.
You mean there is no need to change anything, right?
I think it is better to return "recall" instead of "precision".
(edit) I meant "rename" rather than "return".
I think it will be better rename "precision" to "recall"(existing one) and add a new "recall"(right one I mentioned earlier) because It is quite cheap to add another ranking metric to the existing implementation.
However, backward compatibility is quite important, since there are already lots of performance benchmarks with that implementation...
The MAP(Mean Average Precision)@K is also MAR(Mean Average Recall)@K, which is also quite confusing. Hope this will be solved in the future.
If the backward compatibility is a concern, I might be good to get another function called something like corrected_ranking_metrics_at_k
and suggest new users to use this new function.
related #426
Now, I frequently use ranking_metrics_at_k
, and I clearly understood fmin(K, likes.size())
.
If K < likes.size()
, it behaves precision.
If K > likes.size()
, it behaves recall.
I said before,
I think it is better to rename "recall" instead of "precision".
It's wrong, sorry. However, this is still be discussable problem.
By the way,
I think MAP is wrong.
https://github.com/benfred/implicit/blob/80f0e4d372c9808f39e8d72af4aa8403a119e2a3/implicit/evaluation.pyx#L286
I think it should be mean_ap += ap / hit
( need to avoid zero division).
Now, I frequently use
ranking_metrics_at_k
, and I clearly understoodfmin(K, likes.size())
.If
K < likes.size()
, it behaves precision. IfK > likes.size()
, it behaves recall.I said before,
I think it is better to rename "recall" instead of "precision".
It's wrong, sorry. However, this is still be discussable problem.
Thanks for the explanation! But this is still a bit difficult to understand.
What does likes.size() represent here? Is it the number of items a user has liked in the test set (i.e. all non-zero indices in the matrix for one user)?
The reason I'm asking is because I want to calculate the average precision at K for all users and I don't think the implemented metric for precision at K returns that now.
Is it the number of items a user has liked in the test set (i.e. all non-zero indices in the matrix for one user)?
Yes.
This is the repository to implement my custom metrics https://github.com/r-yanyo/implicit/commits/master. I'm actually using this custom metrics for my research about recommender system. In my opinion, precision and recall is complex in recommender system, so I'm using ranking metrics such as MAP and nDCG.