pygaggle icon indicating copy to clipboard operation
pygaggle copied to clipboard

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md

Open farazkh80 opened this issue 2 years ago • 3 comments

Successfully reproduced the same numerical results for pygaggle/docs/experiments-msmarco-passage-subset.md on a Colab env with a T4 GPU.

Encountered a small issue with the python dependencies needed to evaluate using monoBERT.

python -um pygaggle.run.evaluate_passage_ranker --split dev \
                                                --method seq_class_transformer \
                                                --model castorini/monobert-large-msmarco \
                                                --dataset data/msmarco_ans_small/ \
                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
                                                --task msmarco \
                                                --output-file runs/run.monobert.ans_small.dev.tsv

The error log was

2022-12-26 02:37:05.453924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-12-26 02:37:08 [INFO] utils: NumExpr defaulting to 2 threads.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/content/pygaggle/pygaggle/run/evaluate_passage_ranker.py", line 13, in <module>
    from pygaggle.rerank.base import Reranker
  File "/content/pygaggle/pygaggle/rerank/base.py", line 5, in <module>
    from pyserini.search import JLuceneSearcherResult
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/__init__.py", line 19, in <module>
    from .lucene import JLuceneSearcherResult, LuceneSimilarities, LuceneFusionSearcher, LuceneSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/__init__.py", line 18, in <module>
    from ._impact_searcher import JImpactSearcherResult, LuceneImpactSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/_impact_searcher.py", line 28, in <module>
    from pyserini.encode import QueryEncoder, TokFreqQueryEncoder, UniCoilQueryEncoder, \
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/__init__.py", line 17, in <module>
    from ._base import DocumentEncoder, QueryEncoder, JsonlCollectionIterator,\
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/_base.py", line 19, in <module>
    import faiss
ModuleNotFoundError: No module named 'faiss'

Fix:

pip install faiss-cpu

farazkh80 avatar Dec 26 '22 02:12 farazkh80

Added "What's going on?" toggle blocks to illustrate the effect of re-ranking on the top hit's relevancy to a certain qid.

For each "What's going on?" toggle block

  1. Show the head of each generated run file
  2. Choose the first line of the run file
  3. Grep the qid and docid to show the actual corresponding text of the query and the passage
  4. Check the factual relevancy by retrieving the qrel files and checking if qid and docid appear as a match.

farazkh80 avatar Dec 29 '22 06:12 farazkh80

Thanks for doing this! Could you please also add pip install faiss-cpu in the instructions?

rodrigonogueira4 avatar Dec 29 '22 10:12 rodrigonogueira4

Added faiss-cpu installation!

farazkh80 avatar Jan 02 '23 23:01 farazkh80