Reciprocal Rank Fusion (RRF) in TopDocs
Description
Hello the community,
Hank and I just follow the discussion thread to implement the RRF function that can be used. By the way, we know that the RRF issue is under debate in solr (FYR); however, we think this new feature could still be a good one.
I'm not sure 'rrf' should be a direct method in topDocs: Reciprocal Rank Fusion is just one way of combining result sets, if in the future we want to add other algorithms having 'rrf' there may encourage to just add and add to topDocs. What about having a "combine" method there, potentially taking in input the combining strategy? Then abstract the combining strategy as an interface/abstract class and implement Reciprocal Rank Fusion as the first available strategy? That should ease the process of adding more strategies and prevent TopDocs to become too dirty in the future.
N.B. I am generally in favour of "You are Not Gonna Need It' approach, but in Lucene's instance we have many contributors and future contributors that may get involved, and doing this abstraction work when and if "a second strategy" gets implemented may not happen
I'm not worried about this. If we feel like we should expose it differently in the future, we'll do it, deprecate this function, and remove it in Lucene 11.
@harenlin I took some freedom to apply my feedback and push it to your branch. Would you like to take a look and check if it makes sense?
I plan on merging this PR soon if there are no objections.
This looks good to me. Perhaps we could mark the new static method experimental, especially if we think we are going to want to support more ways of combining topdocs soon enough. I don't have a strong opinion though, it would also be ok to introduce a more flexible way to do rrf while keeping this one around until the next major.
Thanks for taking a look. I have a bias for the latter, as I was planning on improving the docs of the oal.search package as a follow-up to provide guidance wrt how to do hybrid search by linking to this RRF helper.
I have a bias for the latter, as I was planning on improving the docs of the oal.search package as a follow-up to provide guidance wrt how to do hybrid search by linking to this RRF helper.
I opened #14310.