pygaggle
pygaggle copied to clipboard
reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md
Successfully reproduced the same numerical results for pygaggle/docs/experiments-msmarco-passage-subset.md
on a Colab env with a T4 GPU.
Encountered a small issue with the python dependencies needed to evaluate using monoBERT.
python -um pygaggle.run.evaluate_passage_ranker --split dev \
--method seq_class_transformer \
--model castorini/monobert-large-msmarco \
--dataset data/msmarco_ans_small/ \
--index-dir indexes/index-msmarco-passage-20191117-0ed488 \
--task msmarco \
--output-file runs/run.monobert.ans_small.dev.tsv
The error log was
2022-12-26 02:37:05.453924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-12-26 02:37:08 [INFO] utils: NumExpr defaulting to 2 threads.
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/content/pygaggle/pygaggle/run/evaluate_passage_ranker.py", line 13, in <module>
from pygaggle.rerank.base import Reranker
File "/content/pygaggle/pygaggle/rerank/base.py", line 5, in <module>
from pyserini.search import JLuceneSearcherResult
File "/usr/local/lib/python3.8/dist-packages/pyserini/search/__init__.py", line 19, in <module>
from .lucene import JLuceneSearcherResult, LuceneSimilarities, LuceneFusionSearcher, LuceneSearcher
File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/__init__.py", line 18, in <module>
from ._impact_searcher import JImpactSearcherResult, LuceneImpactSearcher
File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/_impact_searcher.py", line 28, in <module>
from pyserini.encode import QueryEncoder, TokFreqQueryEncoder, UniCoilQueryEncoder, \
File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/__init__.py", line 17, in <module>
from ._base import DocumentEncoder, QueryEncoder, JsonlCollectionIterator,\
File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/_base.py", line 19, in <module>
import faiss
ModuleNotFoundError: No module named 'faiss'
Fix:
pip install faiss-cpu
Added "What's going on?" toggle blocks to illustrate the effect of re-ranking on the top hit's relevancy to a certain qid
.
For each "What's going on?" toggle block
- Show the head of each generated run file
- Choose the first line of the run file
- Grep the
qid
anddocid
to show the actual corresponding text of the query and the passage - Check the factual relevancy by retrieving the
qrel
files and checking ifqid
anddocid
appear as a match.
Thanks for doing this! Could you please also add pip install faiss-cpu
in the instructions?
Added faiss-cpu
installation!