haystack-tutorials
haystack-tutorials copied to clipboard
Update the eval tutorial
Check out this pull request onΒ ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:45Z ----------------------------------------------------------------
...model-based evaluation frameworks integerated -> integrated
Also, in the Goal field, you can add more information about the evaluation methods we used in the tutorial. As I understand, we use both model-based evaluators and statistical evaluators
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------
You will run your RAG pipeline and evaluated -> evaluate
We will use some of the available evalution -> evaluation
Why do the evaluation docs link have v2.1-unstable/ path?
TuanaCelik commented on 2024-04-30T11:16:10Z ----------------------------------------------------------------
Because that's the only live one right now. Will change when released.
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------
You can remove /2.0/ paths from these links too
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:47Z ----------------------------------------------------------------
Line #3. pip install git+https://github.com/deepset-ai/haystack.git@main
Do we need to install it from main?
TuanaCelik commented on 2024-04-30T11:16:23Z ----------------------------------------------------------------
right now yes, will change when released
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:48Z ----------------------------------------------------------------
First, let's actually tun -> run
You will notice that this is why we provide a list od -> of
TuanaCelik commented on 2024-04-30T11:16:45Z ----------------------------------------------------------------
thanks π
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:49Z ----------------------------------------------------------------
Line #7. eval_pipeline.add_component("groundness_evaluator", FaithfulnessEvaluator())
The name is "groundness_evaluator" but the component is "FaithfulnessEvaluator". Are they the same thing? Or maybe I'm missing something
julian-risch commented on 2024-04-30T09:53:58Z ----------------------------------------------------------------
We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?
julian-risch commented on 2024-04-30T09:55:02Z ----------------------------------------------------------------
AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.
View / edit / reply to this conversation on ReviewNB
julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------
Let's rename the components as their names will be used as column names:
from haystack.components.evaluators.document_mrr import DocumentMRREvaluator
from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator
from haystack.components.evaluators.sas_evaluator import SASEvaluator
eval_pipeline = Pipeline()
eval_pipeline.add_component("mean_reciprocal_rank", DocumentMRREvaluator())
eval_pipeline.add_component("faithfulness", FaithfulnessEvaluator())
eval_pipeline.add_component("semantic_answer_similarity", SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2"))
results = eval_pipeline.run({
"mean_reciprocal_rank": {"ground_truth_documents": list([d] for d in ground_truth_docs) , "retrieved_documents": retrieved_docs},
"faithfulness": {"questions": list(questions), "contexts": list([d.content] for d in ground_truth_docs), "responses": rag_answers},
"semantic_answer_similarity": {"predicted_answers": rag_answers, "ground_truth_answers": list(ground_truth_answers)}
})
View / edit / reply to this conversation on ReviewNB
julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------
This cell can be simplified to
from haystack.evaluation.eval_run_result import EvaluationRunResult
inputs = {
"question": list(questions),
"contexts": list([d.content] for d in ground_truth_docs),
"answer": list(ground_truth_answers),
"predicted_answer": rag_answers,
}
evaluation_result = EvaluationRunResult(run_name="pubmed_rag_pipeline", inputs=inputs, results=results)
evaluation_result.score_report()
We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?
View entire conversation on ReviewNB
AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.
View entire conversation on ReviewNB
Because that's the only live one right now. Will change when released.
View entire conversation on ReviewNB
@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.
@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter
responseswas renamed topredicted_answers. And if you think we should rename the component we can still do it, just let me know.
Hey @julian-risch - thanks for the info. I don't have strong opinions on the component naming. Imo, 'faithfulness' is widely used at this point. I'll defer to you guys to make the final call here. I'm ok with either
Comments are resolved. For whoever merging: ~- Update the installation to haystack-ai after release~
- Check that the image is rendered correctly on the website and if not, update it to the raw github url