anserini
anserini copied to clipboard
Anserini is a Lucene toolkit for reproducible information retrieval research
We should develop a generic mechanism to store and use Waterloo spam scores, PageRank, HITS, and other static priors. @iorixxx Do you have some code to contribute along these lines?
@mpetri, @amallia, and I have come across a weird bug where an input JsonVectorCollection will have its weights broken by long terms, possibly impacting downstream ranking. The specific bug is...
Currently, we're extending MultifieldSourceDocument, which probably shouldn't be the case.
Command used: ``` sh target/appassembler/bin/IndexCollection \ -collection BibtexCollection -generator BibtexGenerator \ -threads 8 -input {/path/to/bib_files/} \ -index {/path/to/bibtex_indexes} \ -storePositions -storeDocvectors -storeContents -storeRaw ``` Full error message ``` 2022-03-01 02:28:28,573...
Hello, I have some questions regarding Anserini's implementation of BM25 + RM3 . Disclaimer: I've never used the package, this is asked for comparison purposes. I could not track the...
Currently, Anserini is used to generate CIFF files with the [CIFF](https://github.com/osirrc/ciff) repo. A number of other systems like Terrier, PISA, JASSv2, OldDog can read/index CIFF files. However, Anserini doesn't currently...
Does anyone know if there's a batching method for IndexReader.compute_query_document_score in anserini? The original method seems to run very slow, roughly 6 hrs for ~500k query-doc pairs on an Intel(R)...
We don't have SDM regression tested. We should fix this.
Currently, when doing the search users can specify base ranking model like BM25/QL and a number of rerankers. Logically, they consist of one ranking model where only the final results...