tevatron Splade Encoding and Evaluation not working

Hi @MXueguang , Based on your suggestion from https://github.com/texttron/tevatron/issues/149 to use splade example.

It has few issues:

[x] 1. encode_splade.py is outdated based on new arguments and module paths in the latest library updates.
[x] 2. Readme instruction for encoding is outdated, for passing right argument names.
[ ] 3. Need to verify if the evaluation scripts works and replicate the evaluation result on atleast one splade version.

I have fixed the first two already in the current pull request. To work on the last, I just want to check if we should create a function in searcher python file for sparse retriever output (It might be more consistent with the repo?), or keep the original index->retrieve->evaluate using pyserini from readme instruction. Please share your thoughts.

Thanks, Srikanth

Sep 04 '24 17:09 srikanthmalla

Hi @srikanthmalla, thank you again for helping us improve the codebase. I think the hard part is the indexing step for sparse representation. We need to use pyserini to index it properly. For search, we can create more consistent python script and internally use pyserini for search. I'm not very sure here, do you think it will make usage easier?

Sep 06 '24 19:09 MXueguang

Hi @MXueguang , I am getting below results using evaluation script from beir repo on arguana dataset: ndcg@10: 0.525 (close to the reported number in the paper) and other metrics that are not reported as well (map@10: 0.435 and recall@10 : 0.813).

For now, it might be fine to use pyserini, or even evaluation scripts from beir. But the current instruction in readme using pyserini is giving error on the last evaluation step python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset splade_results.tsv or python -m pyserini.eval.msmarco_passage_eval beir-v1.0.0-arguana-test splade_results.tsv:

Running command: ['python', '/home/user/.cache/pyserini/eval/msmarco_passage_eval.py', '/home/user/.cache/anserini/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt', 'splade_results.tsv']
Traceback (most recent call last):
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 27, in load_reference_from_stream
    qid = int(l[0])
ValueError: invalid literal for int() with base 10: 'test-environment-aeghhgwpe-pro02a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 184, in <module>
    main()
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 173, in main
    metrics = compute_metrics_from_files(path_to_reference, path_to_candidate)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 157, in compute_metrics_from_files
    qids_to_relevant_passageids = load_reference(path_to_reference)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 43, in load_reference
    qids_to_relevant_passageids = load_reference_from_stream(f)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 34, in load_reference_from_stream
    raise IOError('\"%s\" is not valid format' % l)
OSError: "['test-environment-aeghhgwpe-pro02a', '0', 'test-environment-aeghhgwpe-pro02b', '1']" is not valid format

I also tried converting tsv and trec using this command , and evaluating using pyserini trec_eval command. This gives ndcg close to 0 in almost all cuts.

It would be probably good idea to fix the readme instruction, doesn't matter if we are using beir or pyserini evaluation (which ever gives replicable results is more important).

Finally, having self-contained repo would be amazing! For example, if we are adapting some functionality from beir or pyserini for evaluation, we could either put their particular version on a folder, or could use git submodule. The only problem with git submodule or just pip dependency is if the dependency repo is removed later in the future. These approaches help the current repo not broken if there are any updates in dependencies, both from instructions and also any code. Please let me know your thoughts, if I should add splade evaluation script using beir in the examples/splade folder? or you would take a look into pyserini instructions in readme of examples/splade subfolder.

Thank you, Srikanth

Sep 09 '24 07:09 srikanthmalla

Thank you for the contribution. I can take over from here.

Apr 01 '25 05:04 ArvinZhuang

tevatron tevatron copied to clipboard

Splade Encoding and Evaluation not working

tevatron
tevatron copied to clipboard