haystack-tutorials icon indicating copy to clipboard operation
haystack-tutorials copied to clipboard

Update the eval tutorial

Open TuanaCelik opened this issue 1 year ago β€’ 17 comments
trafficstars

TuanaCelik avatar Apr 29 '24 19:04 TuanaCelik

Check out this pull request onΒ  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:45Z ----------------------------------------------------------------

...model-based evaluation frameworks integerated -> integrated

Also, in the Goal field, you can add more information about the evaluation methods we used in the tutorial. As I understand, we use both model-based evaluators and statistical evaluators


View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------

You will run your RAG pipeline and evaluated -> evaluate

We will use some of the available evalution -> evaluation

Why do the evaluation docs link have v2.1-unstable/ path?


TuanaCelik commented on 2024-04-30T11:16:10Z ----------------------------------------------------------------

Because that's the only live one right now. Will change when released.

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------

You can remove /2.0/ paths from these links too


View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:47Z ----------------------------------------------------------------

Line #3.    pip install git+https://github.com/deepset-ai/haystack.git@main

Do we need to install it from main?


TuanaCelik commented on 2024-04-30T11:16:23Z ----------------------------------------------------------------

right now yes, will change when released

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:48Z ----------------------------------------------------------------

First, let's actually tun -> run

You will notice that this is why we provide a list od -> of


TuanaCelik commented on 2024-04-30T11:16:45Z ----------------------------------------------------------------

thanks πŸ™

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:49Z ----------------------------------------------------------------

Line #7.    eval_pipeline.add_component("groundness_evaluator", FaithfulnessEvaluator())

The name is "groundness_evaluator" but the component is "FaithfulnessEvaluator". Are they the same thing? Or maybe I'm missing something


julian-risch commented on 2024-04-30T09:53:58Z ----------------------------------------------------------------

We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?

julian-risch commented on 2024-04-30T09:55:02Z ----------------------------------------------------------------

AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.

View / edit / reply to this conversation on ReviewNB

julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------

Let's rename the components as their names will be used as column names:

from haystack.components.evaluators.document_mrr import DocumentMRREvaluator
from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator
from haystack.components.evaluators.sas_evaluator import SASEvaluator


eval_pipeline = Pipeline()
eval_pipeline.add_component("mean_reciprocal_rank", DocumentMRREvaluator())
eval_pipeline.add_component("faithfulness", FaithfulnessEvaluator())
eval_pipeline.add_component("semantic_answer_similarity", SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2"))


results = eval_pipeline.run({
    "mean_reciprocal_rank": {"ground_truth_documents": list([d] for d in ground_truth_docs) , "retrieved_documents": retrieved_docs},
    "faithfulness": {"questions": list(questions), "contexts": list([d.content] for d in ground_truth_docs), "responses": rag_answers},
    "semantic_answer_similarity": {"predicted_answers": rag_answers, "ground_truth_answers": list(ground_truth_answers)}
})

View / edit / reply to this conversation on ReviewNB

julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------

This cell can be simplified to

from haystack.evaluation.eval_run_result import EvaluationRunResult

inputs = {
        "question": list(questions),
        "contexts": list([d.content] for d in ground_truth_docs),
        "answer": list(ground_truth_answers),
        "predicted_answer": rag_answers,
    }
evaluation_result = EvaluationRunResult(run_name="pubmed_rag_pipeline", inputs=inputs, results=results)
evaluation_result.score_report()

We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?


View entire conversation on ReviewNB

julian-risch avatar Apr 30 '24 09:04 julian-risch

AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.


View entire conversation on ReviewNB

julian-risch avatar Apr 30 '24 09:04 julian-risch

Because that's the only live one right now. Will change when released.


View entire conversation on ReviewNB

TuanaCelik avatar Apr 30 '24 11:04 TuanaCelik

right now yes, will change when released


View entire conversation on ReviewNB

TuanaCelik avatar Apr 30 '24 11:04 TuanaCelik

thanks πŸ™


View entire conversation on ReviewNB

TuanaCelik avatar Apr 30 '24 11:04 TuanaCelik

@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.

julian-risch avatar Apr 30 '24 14:04 julian-risch

@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.

Hey @julian-risch - thanks for the info. I don't have strong opinions on the component naming. Imo, 'faithfulness' is widely used at this point. I'll defer to you guys to make the final call here. I'm ok with either

TuanaCelik avatar May 01 '24 15:05 TuanaCelik

Comments are resolved. For whoever merging: ~- Update the installation to haystack-ai after release~

  • Check that the image is rendered correctly on the website and if not, update it to the raw github url

TuanaCelik avatar May 01 '24 15:05 TuanaCelik