Strange calculation of Recall in Factual Correctness
When calculate factual correctness because of fuzzy matching, the following could happen:
4 claims in response 2 claims in ground truth
Let's say 2 claims in response are good, and 2 are not. Conclusion TP=2, FP=2.
Then from 2 claims in ground truth 1 is good and 1 is not. Code correctly calculates FN=1.
And
recall = 2/(2+1) = 2/3
But, if total number of claims in ground truth is 2, how could recall faction have denominator of "3". With recall we ask how many claims in ground truth were covered, and in this case it is 1, so recall would be 1/2 I would say. The 3 above comes by mixing TP calculated for the precision.
If you take the example with the sun given in answer correctness class https://github.com/explodinggradients/ragas/blob/d5da272c94b2adb5ba60a8583b024027244f46b6/src/ragas/metrics/_answer_correctness.py#L59
We would come to same:
2 claims in answer 5 claims in ground truth
TP=1, FP=1, FN=5, recall = 1/6 ... but is it? I would argue it is 0/5 in this case.
I would love to hear some thoughts on this.