haystack
haystack copied to clipboard
Multi-query retrieval/query decomposition
Context
Naive RAG starts to fail on more complex type of queries, where information might be spread across different chunks or document sources. The type of queries where it does not typically perform very well are "comparison" queries. For example, in the ARAGOG dataset*, there are questions that could benefit from query decomposition, like
"Describe RoBERTa's approach to training with large mini-batches and its effect on model optimization and performance":
==> “how does Roberta train with large mini-batches”
==> “what is the impact of large mini-batches on model optimisation?”
==> “what is the impact of large mini-batches on performance”
Outcome
- New architecture in the evaluation repository.
- New evaluation script in the evaluation repository.
*caveat on the ARAGOG dataset: it might not be the best to test this type of retriever, given that the question-answers have been generated synthetically using single document chunks.