OpenNIR How to load a pretrained model?

Jan 11 '22 13:01 nirmal2k

Hi- can you clarify whether you're interested in using a different model initialisation (e.g., changing bert-base-uncased to something else) or using a model that's already been fully tuned for ranking?

Jan 11 '22 15:01 seanmacavaney

I want to load antique-vbert-pair.p, the already fine tuned one

Jan 11 '22 15:01 nirmal2k

I want to validate a pretrained model (antique-vbert-pair.p). How do I do that?

Jan 12 '22 05:01 nirmal2k

Hi @nirmal2k -- sorry for the delay.

If you're looking to reproduce the results in Training Curricula for Open Domain Answer Re-Ranking, I recommend you train from scratch. Instructions are here. While it's possible to load the weight files into the cli-based OpenNIR pipelines, it's a bit hacky and tricky to get to work.

If instead, you're looking to conduct further experiments with the models, inspect outputs, etc. by far the easiest way to do it would be using the OpenNIR-PyTerrier integration. You can load the model like so:

import pyterrier as pt
if not pt.started():
  pt.init()
import onir_pt # OpenNIR-PyTerrier integration -- part of OpenNIR
reranker_pair = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair.p', ranker_config={'outputs': 2}, vocab_config={'train': True})

Then you can use the model in a variety of ways. E.g., if you wanted to conduct a similar experiment on ANTIQUE to the one in the paper, you could do:

import pyterrier as pt
if not pt.started():
  pt.init()
import onir_pt
from pyterrier.measures import *

# Dataset and indexing
dataset = pt.get_dataset('irds:antique/test')
indexer = pt.IterDictIndexer('./antique.terrier')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

# Models
bm25 = pt.BatchRetrieve(index_ref, wmodel='BM25') % 100 # BM25 with cutoff of 100
reranker_pair = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair.p', ranker_config={'outputs': 2}, vocab_config={'train': True})
reranker_pair_recip = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair_recip.p', ranker_config={'outputs': 2}, vocab_config={'train': True})

# Experiment
pt.Experiment(
  [
    bm25,
    bm25 >> pt.text.get_text(dataset, 'text') >> reranker_pair,
    bm25 >> pt.text.get_text(dataset, 'text') >> reranker_pair_recip,
  ],
  dataset.get_topics(),
  dataset.get_qrels(),
  [MRR(rel=3), P(rel=3)@1]
)

Which gives the following results:

                  name  RR(rel=3)  P(rel=3)@1
0                 bm25   0.506052       0.345
1        reranker_pair   0.733746       0.630
2  reranker_pair_recip   0.761444       0.670

(Curiously, a bit better than what was reported in the paper. Probably due to using a different system for the first stage retrieval.)

Hope this helps!

Jan 12 '22 11:01 seanmacavaney

Thanks for that @seanmacavaney !! I was able to reproduce those results. Reranking 1000 documents gives an [email protected]. Is there a reason for the drop? Also I'd appreciate if you could provide a code snippet on how to do a forward pass with the loaded pretrained model given a query and document text

Jan 12 '22 16:01 nirmal2k

Reranking 1000 documents gives an [email protected]. Is there a reason for the drop?

I don't know definitively, but I suspect:

There could be a bias because the training documents were sampled from BM25's top 100, and going out to 1000 then goes out of domain.
It could be pulling up relevant documents that do not have relevance assessments. It can sometimes be helpful to report judgment rates (e.g., via including a measure like Judged@10) to sus out such cases.
There's also some work suggesting that tuning the threshold can be super helpful, so it may be possible that 100 isn't ideal either, and that the results could be further improved just by finding a better threshold.

I'd be curious to hear what you find if you get to the bottom of this!

Also I'd appreciate if you could provide a code snippet on how to do a forward pass with the loaded pretrained model given a query and document text

Here ya go!

import pandas as pd
sample_df = pd.DataFrame([
  {'qid': '0', 'query': 'some query text', 'docno': '0', 'text': 'some document text'},
  {'qid': '1', 'query': 'some other query text', 'docno': '1', 'text': 'some other document text'},
])
reranker_pair(sample_df)

should give:

  qid                  query docno                      text     score
0   0        some query text     0        some document text  8.423386
1   1  some other query text     1  some other document text  7.000756

Jan 12 '22 17:01 seanmacavaney

Thanks for the code snippet @seanmacavaney !! And for the reasons as to why there is a drop in MRR, first two reasons you mentioned were on top of my head. Third point seems like a hack for given dataset. I've worked with msmarco and there isn't drop in MRR while trying to rank more documents. The results here in SBERT rerank the entire corpus of 8.8M passages to get that MRR. Maybe it's just that some relevant documents dont have assessments as you mentioned.

Thanks for the insights!!

Jan 13 '22 03:01 nirmal2k

Hi- can you clarify whether you're interested in using a different model initialisation (e.g., changing bert-base-uncased to something else) or using a model that's already been fully tuned for ranking?

Yes, I want to change bert-base-uncased to my fine-tuned version of BERT, but I don't know how to achieve that.

Apr 04 '24 04:04 clin366

OpenNIR OpenNIR copied to clipboard

How to load a pretrained model?

OpenNIR
OpenNIR copied to clipboard