InstructRAG icon indicating copy to clipboard operation
InstructRAG copied to clipboard

Request for Retrieval Code

Open hzby opened this issue 1 year ago • 1 comments

Thank you for your excellent work. It seems that the current repository does not contain the code to retrieve relevant documents using a query. Could you please provide this portion of the code to complete it?

hzby avatar Sep 10 '24 14:09 hzby

感谢您,也想问下这个问题。

wangpuzhou123 avatar Oct 09 '24 03:10 wangpuzhou123

Thank you for your interest in our work. We have provided retrieved documents along with the queries for all datasets used in this work to facilitate easier reproduction. You can find them in our dataset folder.

To perform retrieval on your own corpus, the easiest way is to use Pyserini with prebuilt indexes. Below are some code snippets for sparse retrieval (e.g., BM25) and dense retrieval (e.g., DPR) for your reference.

  • Sparse Retrieval
# Sparse Retriever (BM25)
from pyserini.search.lucene import LuceneSearcher

# Use Wikipedia dump as the retrieval source
searcher = LuceneSearcher.from_prebuilt_index('wikipedia-dpr') 
# Retrieve documents relevant to the given query
hits = searcher.search('who got the first nobel prize in physics')
# Present retrieved document and relevance score
print(f'doc: {searcher.doc(hits[0].docid).raw()}\nscore: {hits[0].score}')
  • Dense Retrieval
# Dense Retriever (DPR)
from pyserini.search.faiss import FaissSearcher, DprQueryEncoder

# Load query encoder
encoder = DprQueryEncoder("facebook/dpr-question_encoder-single-nq-base")
# Use Wikipedia dump as the retrieval source
searcher = FaissSearcher.from_prebuilt_index('wikipedia-dpr-100w.dpr-single-nq', encoder)
# Retrieve documents relevant to the given query
hits = searcher.search('who got the first nobel prize in physics')
# Present retrieved document and relevance score
print(f'doc: {searcher.doc(hits[0].docid).raw()}\nscore: {hits[0].score}')

Please let me know if you have further questions!

weizhepei avatar Oct 18 '24 06:10 weizhepei