Update tutorial on model-based evaluation with Haystack

Open julian-risch opened this issue 1 year ago • 0 comments

Goal: Showcase the Haystack evaluation metrics on an example that is close to what our users are trying to do, i.e. to improve a retriever in a RAG app.

The tutorial should:

use a dataset from https://github.com/deepset-ai/haystack/issues/7438
use metrics that were added to Haystack core: LLM-based (context relevance, faithfulness) and/or the SAS one
the story should be: I am building a RAG pipeline and manage to increase the performance by using the evaluation metrics of Haystack and tweaking the retriever (changing chunk size and/or top_k and/or embedding model)
the tutorial could be split into sections: evaluating retriever with context relevance, evaluating generator with faithfulness and evaluating the whole pipeline with SAS.

Jan 19 '24 14:01 julian-risch