ragas icon indicating copy to clipboard operation
ragas copied to clipboard

FactualCorrectness false negative calculation

Open alvaroalfaro612 opened this issue 9 months ago • 3 comments

Describe the bug

The calculation of answer_correctness metric (0.75factual_correctness+0.25semantic_similarity), does not use the same implementation of factual_correctness as when FactualCorrectness is computed directly. This discrepancy leads to inconsistent results in which answer_correcness can be lower than both, factual_correctness and semantic_similarity.

Code to Reproduce score = evaluate(dataset, metrics=[answer_correctness, answer_similarity, FactualCorrectness()], llm=self.azure_model, embeddings=self.azure_embeddings, run_config=self.run_config) Error trace No error given.

Expected behavior I would expect to factual correctness is the same inside answer_correctness and when calculated with FactualCorrectness directly.

alvaroalfaro612 avatar Mar 10 '25 15:03 alvaroalfaro612

How is fp = fn? fp is sum of ~reference_response and fn is sum of ~response_reference. They are sum of negation of different array.

tinodj avatar Mar 13 '25 12:03 tinodj

How is fp = fn? fp is sum of ~reference_response and fn is sum of ~response_reference. They are sum of negation of different array.

Yes, that is true. I edit the issue. Anyway, in some cases I get different valué from FactualCorrecness and from the inside metric fractal correctness calculated inside answer_correctness.

alvaroalfaro612 avatar Mar 13 '25 14:03 alvaroalfaro612

I confirm the same situation. The reason seems to be that the factual_correctness used in answer_correctness has a different prompt compared to the one used in factual_correctness alone. It would be really nice to have also this version of the factual correctness available.

SilviaTamb avatar Aug 21 '25 15:08 SilviaTamb