Factual Correctness vs Answer Correctness
[x] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
In my project I aim to use both Factual Correctness and Answer Correctness for evaluating RAG. In the documentation of the latter I found the following:
Let's calculate the answer correctness for the answer with low answer correctness. It is computed as the sum of factual correctness and the semantic similarity between the given answer and the ground truth.
Which I understand to entail that AC is derivative of FC. Because evaluation is quite time-consuming, my initial idea was to reuse the precomputed FC scores, and calculate AC myself as a weighted average of FC and Semantic Similarity. But the figures I've obtained were not consistent with such a procedure. I've looked through the code of both metrics, and found that AC does not reuse code from FC at all, and actually the logic is similar only from a very high point of view. For instance during the decomposition phase AC is conditioned on the query, whereas FC does not look at it.
What is the reason behind this disparity? Is one metric superior to another?