Jimmy Lin
Jimmy Lin
Test failure on my iMac Pro, macOS Monterrey 12.1... any ideas? ``` % python -m unittest integrations.sparse.test_ltr_msmarco_document.TestLtrMsmarcoDocument Attempting to initialize pre-built index msmarco-doc-per-passage-ltr. /Users/jimmylin/.cache/pyserini/indexes/index-msmarco-doc-per-passage-ltr-20211031-33e4151.bd60e89041b4ebbabc4bf0cfac608a87 already exists, skipping download. Initializing msmarco-doc-per-passage-ltr......
@MXueguang @crystina-z I just noticed we have two doc pages that have a lot of overlap... https://github.com/castorini/pyserini/blob/master/docs/usage-encode.md https://github.com/castorini/pyserini/blob/master/docs/usage-dense-indexes.md Why do we have both? Should we collapse? @crystina-z iirc, you were...
The `search` method in `LuceneSearcher` currently returns `List[JLuceneSearcherResult]`, which is a list of Java objects. This is fine when Lucene was the only "backend". But with the addition of PyJASS,...
+ Add script from munging raw XML corpus into Pyserini JSON format. + Add repro instructions
Currently, we do something like: ``` SimpleDenseSearcher.from_prebuilt_index(entry, queries) ``` Or alternatively, we have to specify a `query_encoder`. But every prebuilt index already "knows" which query encoder we have to use,...
When paper official comes out, change the reference: https://github.com/castorini/pyserini/blob/master/docs/experiments-dpr-compression.md We should also move the data from Dropbox into S3.
https://github.com/princeton-nlp/DensePhrases Possible URA project?
This seems like a useful feature that we might want to provide bindings for: https://github.com/castorini/anserini/pull/1521
Let's make sure we can replicate all results in the EMNLP paper. I think we're missing some datasets? Assigning to @gauravbaruah
Basically, what the title says...