haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Multi-query retrieval/query decomposition

Open mrm1001 opened this issue 7 months ago • 0 comments

Context

Naive RAG starts to fail on more complex type of queries, where information might be spread across different chunks or document sources. The type of queries where it does not typically perform very well are "comparison" queries. For example, in the ARAGOG dataset*, there are questions that could benefit from query decomposition, like

"Describe RoBERTa's approach to training with large mini-batches and its effect on model optimization and performance": 
==> “how does Roberta train with large mini-batches” 
==> “what is the impact of large mini-batches on model optimisation?”
==> “what is the impact of large mini-batches on performance”

Outcome

  • New architecture in the evaluation repository.
  • New evaluation script in the evaluation repository.

*caveat on the ARAGOG dataset: it might not be the best to test this type of retriever, given that the question-answers have been generated synthetically using single document chunks.

mrm1001 avatar Jul 10 '24 08:07 mrm1001