PolyFuzz Analyse precision recall curve

Analyse precision recall curve

Open KoenLoeffen opened this issue 1 year ago • 1 comments

I have two questions:

The precision-recall curve is a trade off between the min similarity and the percentage matched. So in the ideal case you want both the precision as the recall as high as possible. However I found out in my results that the model with the highest precision and recall isn't always the best. Am I missing something?
How would I set the optimal threshold for the similarity? Is this also based on the precision recall curve?

May 26 '23 04:05 KoenLoeffen

The precision-recall curve is a trade off between the min similarity and the percentage matched. So in the ideal case you want both the precision as the recall as high as possible. However I found out in my results that the model with the highest precision and recall isn't always the best. Am I missing something?

The precision-recall curve is an approximation as we do not have the ground-truth available. We ideally still want this to be as high as possible but it would still be an approximation.

How would I set the optimal threshold for the similarity? Is this also based on the precision recall curve?

Yes, that is the main purpose of the precision-recall curve as defined in PolyFuzz. It helps you understand what the threshold would be to get a certain amount of matches and the relative accuracy of the results.

May 28 '23 04:05 MaartenGr

PolyFuzz PolyFuzz copied to clipboard

Analyse precision recall curve

PolyFuzz
PolyFuzz copied to clipboard