Strange calculation of Recall in Factual Correctness

Open tinodj opened this issue 9 months ago • 0 comments

When calculate factual correctness because of fuzzy matching, the following could happen:

4 claims in response 2 claims in ground truth

Let's say 2 claims in response are good, and 2 are not. Conclusion TP=2, FP=2. Then from 2 claims in ground truth 1 is good and 1 is not. Code correctly calculates FN=1.

And recall = 2/(2+1) = 2/3

But, if total number of claims in ground truth is 2, how could recall faction have denominator of "3". With recall we ask how many claims in ground truth were covered, and in this case it is 1, so recall would be 1/2 I would say. The 3 above comes by mixing TP calculated for the precision.

If you take the example with the sun given in answer correctness class https://github.com/explodinggradients/ragas/blob/d5da272c94b2adb5ba60a8583b024027244f46b6/src/ragas/metrics/_answer_correctness.py#L59

We would come to same:

2 claims in answer 5 claims in ground truth

TP=1, FP=1, FN=5, recall = 1/6 ... but is it? I would argue it is 0/5 in this case.

I would love to hear some thoughts on this.

Mar 13 '25 13:03 tinodj