haystack
haystack copied to clipboard
Update tutorial on model-based evaluation with Haystack
Goal: Showcase the Haystack evaluation metrics on an example that is close to what our users are trying to do, i.e. to improve a retriever in a RAG app.
The tutorial should:
- use a dataset from https://github.com/deepset-ai/haystack/issues/7438
- use metrics that were added to Haystack core: LLM-based (context relevance, faithfulness) and/or the SAS one
- the story should be: I am building a RAG pipeline and manage to increase the performance by using the evaluation metrics of Haystack and tweaking the retriever (changing chunk size and/or top_k and/or embedding model)
- the tutorial could be split into sections: evaluating retriever with context relevance, evaluating generator with faithfulness and evaluating the whole pipeline with SAS.