OpenNIR icon indicating copy to clipboard operation
OpenNIR copied to clipboard

getting error with vbert

Open somnath-banerjee opened this issue 4 years ago • 4 comments

While using vbert, I am getting the error. Please help.

vbert = onir_pt.reranker('vanilla_transformer', 'bert', text_field='abstract', vocab_config={'train': True}) vbert_pipeline = (
pt.BatchRetrieve(index,wmodel='BM25',metadata=["docno", "text"]) % 1000 >>pt.text.get_text(index,"text")
>>vbert
)
df_res= vbert_pipeline.search("can vitamin d cure covid 19")

[2021-09-02 01:10:08,346][onir_pt][DEBUG] using GPU (deterministic) [2021-09-02 01:10:11,481][onir_pt][DEBUG] [starting] batches [2021-09-02 01:10:11,485][onir][CRITICAL] Uncaught exception Traceback (most recent call last): File "vbert_baseline.py", line 123, in df_res= vbert_pipeline.search("can vitamin d cure covid 19") File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/pyterrier/transformer.py", line 177, in search rtr = self.transform(queryDf) File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/pyterrier/transformer.py", line 807, in transform topics = m.transform(topics) File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/onir_pt/init.py", line 277, in transform for count, batch in _logger.pbar(batches, desc='batches', tqdm=pyterrier.tqdm, total=math.ceil(len(dataframe) / self.config['batch_size'])): File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/onir/log.py", line 110, in pbar yield from pbar File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/tqdm/std.py", line 1185, in iter for obj in iterable: File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/onir_pt/init.py", line 417, in _iter_batches batch[f].append(len(doc_tok)) TypeError: object of type 'NoneType' has no len()

somnath-banerjee avatar Sep 01 '21 23:09 somnath-banerjee

Hi @somnath-banerjee,

Sorry for the delay. It looks like the vbert model is trying to re-rank based on the "abstract" field (text_field='abstract'), whereas only a "text" field is available (metadata=["docno", "text"]). I think switching to text_field='text' should resolve your problem!

seanmacavaney avatar Sep 09 '21 14:09 seanmacavaney

Hi @seanmacavaney, Thanks. It worked with changing the text_field = 'text'. I am getting some scores that are negative. I am new to IR. I wonder if you kindly let me know how can I interpret this from a theoretical point of view.
Thanks in advance.

somnath-banerjee avatar Sep 09 '21 20:09 somnath-banerjee

Yes, so the query-document relevance scores produced by the model are only valuable with respect to other query-document relevance scores. In other words, the only thing that matters is that document A's score is greater or less than document B's -- this determines the order of the two documents in the rankings.

Some other models make stronger claims about the meaning of the scores produced. For instance, probabilistic models frame the scores as a probability.

seanmacavaney avatar Sep 09 '21 20:09 seanmacavaney

Thanks a lot for your answer. But if the vbert model produces a negative score for a query-document, what does this mean? How it differs from a query-document for which it gives the positive score?

somnath-banerjee avatar Sep 09 '21 20:09 somnath-banerjee