haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Port evaluation from 1.x and extend to LLMs

Open julian-risch opened this issue 1 year ago • 0 comments

We need to extend evaluation features, in particular for RAG pipelines so that users can answer questions like:

  • Is this pipeline good enough?
  • What should I focus on for optimization?
  • Is pipeline A better than B? (performance, costs, latency)

This includes the following components typically appearing in RAG pipelines: Retrievers, Rankers, DocumentJoiners a) labels available => statistical metrics b) no labels available => model based heuristics / pseudo label generator

Generators a) labels available = model based (SAS, answer correctness ...) b) no labels = model based (groundedness score)

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/6061
- [ ] https://github.com/deepset-ai/haystack/issues/6786

julian-risch avatar Jan 02 '24 11:01 julian-risch