neural-search
neural-search copied to clipboard
[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search
Introduction
This document outlines the design for implementing Reciprocal Rank Fusion, RRF, within the neural search plugin. RRF, first outlined in a 2009 paper by Cormac, Clarke, and Büttcher, is a method for combining rankings from several subqueries, providing better search results than other systems. Risk-reward tradeoffs for different fusion techniques are analyzed in this paper. The general formula for RRF, where k = rank constant and query_j_rank is the ranking for a document when it is returned in a query method in hybrid query, is as follows:
rankScore(document_i) = sum((1/(k + query_1_rank), (1/(k + query_2_rank), ...,
(1/(k + query_j_rank))
Background
The OpenSearch neural search plugin provides semantic search query ability to users by using ML models for embedded text. It supports hybrid queries, which combine text-based search with neural vector search. This design document describes implementation of the RRF technique to combine results from different parts of hybrid query to improve the search relevance.
One advantage of RRF is that it is simple and provides a standard way of processing results. It is scalable and doesn’t require tuning (other than adjusting rank constant). Its simplicity makes it intuitive for users. The algorithm has also been used successfully by several other organizations already. Moreover, incorporating RRF into hybrid query will facilitate customer migration to OpenSeach from platforms where they are already using RRF. Some customers have already requested it to be added as a technique. Based on these reasons, we believe that RRF will provide a benefit to users performing hybrid query.
Functional Requirements
- Reciprocal Rank Fusion (RRF) Implementation: Develop and integrate the RRF technique to use document ranks in individual subquery results to calculate scores, and sum those scores across all subqueries for each document to provide “rank score” to combine hybrid search results
- Hybrid Search Integration: Ensure the RRF technique can be seamlessly integrated into the existing hybrid search framework. Prevent interference with sorting(by different fields), concurrent segment search, and aggregations.
- Normalization Adjustment: Modify or enhance the current normalization process to utilize ranks of documents in subquery result lists, reducing potential biases from score-based normalization.
- Configuration Options: Allow users to control RRF algorithm parameter rank constant (rank constant must not be less than 1). Need to prevent invalid combinations of values in configuration (e.g. mixing current normalization options with RRF options)
Non Functional Requirements
- Smooth integration into existing neural search interface and clear instructions for use for customer
- Keep latency same within the range of the error
Document Scope
In this document we propose a solution for questions below:
- Incorporating RRF into hybrid query under a new processor
- Using 1 parameter, rank constant, with a default value of 60
- Only change to API is for search pipeline configuration
Out of Document Scope
- Providing ability to chain more that one processor together
- Adding weights as a parameter for combining document results from different subqueries
Solution Overview
Implement RRF within the neural search plugin by adding two classes, RRFProcessor and RRFProcessorFactory, and calling them from the NeuralSearch class. Rank constant could be set by user during query but have a default value. A value of 60 for rank constant is common and was used in the original paper describing the technique. It must be greater than or equal to 1. We can include guidance for customers showing how the rank scores shrink as the rank constant grows, and vice versa, to help inform their choice.
Solution HLD: Architectural and Component Design
Proposed Solution
- Modify the hybrid query execution flow to capture the individual result sets from each subquery
- Implement the RRF algorithm to calculate a combined score for each document based on its ranks across the different result sets.
- Merge and sort the final result set based on the calculated RRF Rank Scores.
- Return blended ranks of RRF in form of document scores.
Pros:
- Two-way door in case implementation has issues, can easily be rolled back
- Can customize to fit RRF requirements if existing architecture isn’t ideal
Cons:
- Some duplicating of workflows to set up as new processor
- Might cause confusion as RRF can be conceptualized as a normalization/combination technique but set up outside of NormalizationProcessor
Potential Issues
Risks / Known limitations and Future extensions
- Performance impact due to additional computations for RRF scoring and result merging.
- Compatibility with existing processors - need to identify potential areas of interference between processors. For normalization processor risks are low, the processor that is configured first will publish results, the next one should be ignored and has no effect.
- Potential edge cases or corner cases that need to be handled gracefully.
The following diagram shows the high-level approach for implementing RRF in hybrid query
Solution LLD
For more information about hybrid search, see the following https://opensearch.org/blog/hybrid-search/ The following example uses the data from this tutorial: https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/ I’ve copied over the scores from the match query and neural query into the table below and used the resulting numbers to calculate the ranks and rank scores
RRF used in hybrid query setting up search pipeline:
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"score-ranker-processor": {
"combination": {
"technique": "rrf",
"parameters": {
"rank_constant": 60
}
}
}
}
]
}
Using hybrid query currently, setting up search pipeline (no changes from current setup):
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"_source": {
"exclude": [
"passage_embedding"
]
},
"query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "cowboy rodeo bronco"
}
}
},
{
"neural": {
"passage_embedding": {
"query_text": "wild west",
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
"k": 5
}
}
}
]
}
}
}
The scores and ranks are based on the example data:
Implementation Details
- Query portion of the hybrid query can be reused without any changes
- RRF algorithm parameters can be set in the pipeline processor configuration
- Input: rank constant. Optional, we can use following default: 60 for rank constant (cited in the original paper and used successfully in existing implementations). The rank constant must be >= 1 to avoid divide by 0 error
- First part of RRF algorithm is implemented by scoreRanker
- Input: List of query result sets from each shard. Each shard result will have results of each individual subquery
- Performs initial processing of documents to calculate rank score for each document in each subquery
- Output: Sorted list of query result sets with calculated RRF rank scores for each subquery
- Summation part of RRF algorithm implemented in same class by rankResults:
- Input: Sorted list of query result sets with calculated RRF rank scores for each subquery
- Summing up Rank Scores for documents across subqueries and combining lists of document rank scores from all subqueries and sorting to provide final sorted list of documents to send to fetch phase
- Output: Sorted list of combined query result sets (sorted by combined rank score)
Benefits
- Possible improved search relevance by combining results from multiple queries effectively.
- Increased flexibility in hybrid query construction and result fusion.
- Seamless integration with the existing neural search plugin and hybrid query infrastructure.
- An additional “normalization and combination” method that customers may find helpful
Alternatives Considered
Alternative 1:
Implementing RRF as a NormalizationTechnique and CombinationTechnique (RRFNormalizationTechnique and RRFCombinationTechnique classes) that would be called by NormalizationProcessor the same way it currently works when configured with, for example, L2NormalizationTechnique and ArithmeticMeanScoreCombinationTechnique. The RRFNormalizationTechnique class would perform the subquery-level reciprocal rank score calculation part of the algorithm and would pass the rank scores of all documents in each subquery to the RRFCombinationTechnique class to combine each document’s rank scores, then merge and sort the results, returning a list of documents sorted by combined RRF rank scores
Pros:
- Easy to copy structure of existing classes and add new classes into workflow
- Easier adoption by users, setup similar to current setup
Cons:
- Potential for creating problems that are hard to contain within just the RRF implementation
- Would require intervention to prevent users from mixing and matching normalization and combination techniques
- more complicated logic for parameter parsing/validation in the factory. RRF and existing processors don't share same parameters, will require branching in logic
Example pipeline configuration call
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "rrf",
"parameters": {
"rank_constant": 60
}
},
"combination": {
"technique": "rrf",
}
}
}
]
}
Alternative 2:
Implementing RRF as a processor, RRFProcessor at the same level as NormalizationProcessor, registered by a RRFProcessorFactory, and the RRFProcessor calling a RRFNormalizationTechnique and RRFCombinationTechnique as described above
Pros:
- Similar to Alternative 1, easy to copy structure of existing classes and add new classes into workflow
- Adding processor at NormalizationProcessor level would prevent issues with need for intervention to prevent users from mixing and matching normalization and combination techniques
Cons:
- Potential for creating problems that are hard to contain within just the RRF implementation
Example search pipeline configuration call
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"score-ranker-processor": {
"normalization": {
"technique": "rrf",
"parameters": {
"rank_constant": 60
}
},
"combination": {
"technique": "rrf",
}
}
}
]
}
Potential Issues
Risks / Known limitations and Future extensions
- Performance impact due to additional computations for RRF scoring and result merging.
- Compatibility with existing processors - need to identify potential areas of interference between processors. For normalization processor risks are low, the processor that is configured first will publish results, the next one should be ignored and has no effect.
- Potential edge cases or corner cases that need to be handled gracefully.
Backward Compatibility
Will be backward compatible, will ensure that omitting RRF-specific parameters in request configurations will not cause problems
Testability
I will be writing unit tests and integration tests for this implementation
Benchmarking
Benchmarking will involve testing across several datasets used in the initial normalization implementation testing (NFCorpus, Trec-Covid, ArguAna, FiQA, Scifact, DBPedia, Quora, Scidocs, CQADupStack, Amazon ESCI) and comparing the average nDCG@10 score against that of the L2 and Min-Max normalization techniques combined with the Arithmetic, Geometric, and Harmonic combination techniques, using BM25 nDCG@10 scores on these datasets as the baseline. We will also update nightlies benchmark runs to capture performance.
Feedback Required
Should we consider adding weights for combining rank scores from different subqueries? If so, how would weights be determined? Are there concerns about or objections to incorporating Reciprocal Rank Fusion (RRF) as a new processor, instead of integrating it into the existing NormalizationProcessor? If there are foreseeable problems with this approach, what would a better alternative look like?