matchmaker
matchmaker copied to clipboard
How to integrate pre-trained ColBERT as retriever into retriever-reranker IR system
Hi!
tl;dr: If it's a good idea, which matchmaker
modules should I use to index and rank documents using ColBERT-like retriever with token-level vector index?
We are trying to evaluate the benefits of different retrievers in the retriever-reranker framework.
We found that neural dense retrievers bring big qualitative benefits with our data set (100k questions + appx. 1mil answers from MathStackExchange). I'd like to evaluate the potential benefits of token-level neural retrieval, but struggle to fit the contextual token-level index into memory. Though it seems that this is something that matchmaker
can deal with.
I've looked into dense_retrieval_evaluate README, though I presumed that I will not suffice with CLI, if I also want to proceed with reranker.
That led me to an attempt to re-utilise dense_retrieval.py, but with my limited understanding of the code, I believe that this script still expects the model to deliver a single dense representation (vector) per document.
I am wondering if dense_retrieval.py
is a good place to start or there are some easier ways around integrating (and possibly permuting) different neural retrievers in the retriever-reranker IR system?
If it helps anyone, our evaluation framework can be found in this notebook.