Jimmy Lin
Jimmy Lin
Capturing discussion with @AileenLin - For `cw12b13`, for whatever reason, even with `-storeDocvector` during indexing, there is at least one document that doesn't have a doc vector. This means that...
As a result, on the Pyserini end, the `LuceneSearcher` and `IndexReader` are completely disconnected.
@justram I believe this was introduced by #1828 ``` TypeError: init_query_encoder() missing 1 required positional argument: 'multimodal' Traceback (most recent call last): File "/home/jimmylin/.conda/envs/pyserini-dev3/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code,...
Here: https://castorini.github.io/pyserini/2cr/msmarco-v2-doc.html We're missing uniCOIL (noexp) and uniCOIL (w/ doc2query-T5) for TREC 2023. @MXueguang can you please add this?
We want to add the latest SPLADE++ ED BEIR regressions here: https://castorini.github.io/pyserini/2cr/beir.html
From @sahel-sh - we can improve the onboarding docs by more accurately characterizing how long things take, on what hardware, RAM/CPU requirements, etc.
@yilinjz @UShivani3 et al. recently had issues getting Pyserini installed... I think we should refactor the installation instructions? + I had a start here: https://github.com/castorini/pyserini/pull/1609 but I don't think it's...
It'd be great to have a version of this: https://github.com/castorini/pyserini/blob/master/docs/experiments-nfcorpus.md but using OpenAI embeddings.
#1572 needs a test case.
I'd like to have documentation that has a complete end-to-end example worked out from indexing to retrieval - + indexing BM25 + retrieval using BM25 + indexing using dense vectors...