Recalculation of Answers and Contexts in evaluate Function

Open pds13193 opened this issue 1 year ago • 5 comments

Context: I am calling the evaluate function to calculate faithfulness and answer correctness. The dataset, which contains answer, context, question, and ground_truth, is passed as input. The evaluate function is defined in llama_index.py (path to file -> ragas/integrations).

Concern: I noticed that inside the evaluate function, answers and contexts are recalculated for the user query and passed for evaluation. My main question is: why are the answers and contexts recalculated, and why are these recalculated values used for evaluation instead of the original answers and contexts generated by our model?

In my opinion, recalculating the answers and contexts and using them in the evaluation could lead to incorrect faithfulness and answer correctness scores. This might be a bug. Please clarify if this behavior is intentional or provide the reasoning behind it.

Below is a snapshot of the code in llama_index.py where the answers and contexts are recalculated:

Aug 22 '24 07:08 pds13193