haystack-tutorials Update the eval tutorial

trafficstars

Apr 29 '24 19:04 TuanaCelik

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Apr 29 '24 19:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:45Z ----------------------------------------------------------------

...model-based evaluation frameworks integerated -> integrated

Also, in the Goal field, you can add more information about the evaluation methods we used in the tutorial. As I understand, we use both model-based evaluators and statistical evaluators

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------

You will run your RAG pipeline and evaluated -> evaluate

We will use some of the available evalution -> evaluation

Why do the evaluation docs link have v2.1-unstable/ path?

TuanaCelik commented on 2024-04-30T11:16:10Z ----------------------------------------------------------------

Because that's the only live one right now. Will change when released.

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------

You can remove /2.0/ paths from these links too

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:47Z ----------------------------------------------------------------

Line #3.    pip install git+https://github.com/deepset-ai/haystack.git@main

Do we need to install it from main?

TuanaCelik commented on 2024-04-30T11:16:23Z ----------------------------------------------------------------

right now yes, will change when released

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:48Z ----------------------------------------------------------------

First, let's actually tun -> run

You will notice that this is why we provide a list od -> of

TuanaCelik commented on 2024-04-30T11:16:45Z ----------------------------------------------------------------

thanks 🙏

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:49Z ----------------------------------------------------------------

Line #7.    eval_pipeline.add_component("groundness_evaluator", FaithfulnessEvaluator())

The name is "groundness_evaluator" but the component is "FaithfulnessEvaluator". Are they the same thing? Or maybe I'm missing something

julian-risch commented on 2024-04-30T09:53:58Z ----------------------------------------------------------------

We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?

julian-risch commented on 2024-04-30T09:55:02Z ----------------------------------------------------------------

AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------

Let's rename the components as their names will be used as column names:

from haystack.components.evaluators.document_mrr import DocumentMRREvaluator
from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator
from haystack.components.evaluators.sas_evaluator import SASEvaluator


eval_pipeline = Pipeline()
eval_pipeline.add_component("mean_reciprocal_rank", DocumentMRREvaluator())
eval_pipeline.add_component("faithfulness", FaithfulnessEvaluator())
eval_pipeline.add_component("semantic_answer_similarity", SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2"))


results = eval_pipeline.run({
    "mean_reciprocal_rank": {"ground_truth_documents": list([d] for d in ground_truth_docs) , "retrieved_documents": retrieved_docs},
    "faithfulness": {"questions": list(questions), "contexts": list([d.content] for d in ground_truth_docs), "responses": rag_answers},
    "semantic_answer_similarity": {"predicted_answers": rag_answers, "ground_truth_answers": list(ground_truth_answers)}
})

Apr 30 '24 09:04 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------

This cell can be simplified to

from haystack.evaluation.eval_run_result import EvaluationRunResult

inputs = {
        "question": list(questions),
        "contexts": list([d.content] for d in ground_truth_docs),
        "answer": list(ground_truth_answers),
        "predicted_answer": rag_answers,
    }
evaluation_result = EvaluationRunResult(run_name="pubmed_rag_pipeline", inputs=inputs, results=results)
evaluation_result.score_report()

Apr 30 '24 09:04 review-notebook-app[bot]

We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?

View entire conversation on ReviewNB

Apr 30 '24 09:04 julian-risch

AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.

View entire conversation on ReviewNB

Apr 30 '24 09:04 julian-risch

Because that's the only live one right now. Will change when released.

View entire conversation on ReviewNB

Apr 30 '24 11:04 TuanaCelik

right now yes, will change when released

View entire conversation on ReviewNB

Apr 30 '24 11:04 TuanaCelik

thanks 🙏

View entire conversation on ReviewNB

Apr 30 '24 11:04 TuanaCelik

@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.

Apr 30 '24 14:04 julian-risch

@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.

Hey @julian-risch - thanks for the info. I don't have strong opinions on the component naming. Imo, 'faithfulness' is widely used at this point. I'll defer to you guys to make the final call here. I'm ok with either

May 01 '24 15:05 TuanaCelik

Comments are resolved. For whoever merging: ~- Update the installation to haystack-ai after release~

Check that the image is rendered correctly on the website and if not, update it to the raw github url

May 01 '24 15:05 TuanaCelik

haystack-tutorials haystack-tutorials copied to clipboard

Update the eval tutorial

haystack-tutorials
haystack-tutorials copied to clipboard