giskard Raget: Possible miscalculation of all Ragas metrics, in particular Precision and Recall

Raget: Possible miscalculation of all Ragas metrics, in particular Precision and Recall

Open Chabert-Liddell opened this issue 2 months ago • 2 comments

Issue Type

Bug

Source

source

Giskard Library Version

2.11

Giskard Hub Version

OS Platform and Distribution

No response

Python version

No response

Installed python packages

No response

Current Behaviour?

Giskard RAGet uses the reference context when calling Ragas. 

https://github.com/Giskard-AI/giskard/blob/main/giskard/rag/metrics/ragas_metrics.py

        ragas_sample = {
            "question": question_sample["question"],
            "answer": answer,
            "contexts": question_sample["reference_context"].split("\n\n"),
            "ground_truth": question_sample["reference_answer"],
        }

According to Ragas documentation the retrieved context should be used (the one used for the answer Generation).

As an example, when computing Precision or Recall which both uses {"question", "contexts", "ground_truth"}, if you are giving the reference context, then you are evaluating your test set generation pipeline  and not your RAG pipeline.

Standalone code OR list down the steps to reproduce the issue

Relevant log output

No response

May 03 '24 09:05 Chabert-Liddell

giskard giskard copied to clipboard

Raget: Possible miscalculation of all Ragas metrics, in particular Precision and Recall

Issue Type

Source

Giskard Library Version

Giskard Hub Version

OS Platform and Distribution

Python version

Installed python packages

Current Behaviour?

Standalone code OR list down the steps to reproduce the issue

Relevant log output

giskard
giskard copied to clipboard