pyterrier icon indicating copy to clipboard operation
pyterrier copied to clipboard

Setting fields in dataframe indexing

Open iknoorjobs opened this issue 2 years ago • 0 comments

Hi,

I am struggling to set fields (which is required for most of the neural approaches like monoT5 in pyterrier) while indexing using the Pandas dataframe of my own dataset. I have tried all the methods mentioned at https://github.com/terrier-org/pyterrier/blob/master/examples/notebooks/indexing.ipynb for setting fields. These include,

1. pd_indexer.index(df["text"], df["docno"])
2. pd_indexer.index(df["text"], df["docno"], df["additional_field"])
3. pd_indexer.index(df["text"], docno= df["docno"], additional_field=df["additional_field"])
4. pd_indexer.index(df["text"], df)

Only (1) works but I guess it does not has "additional_field". I have also noticed when we use (2), (3) and (4) method, the results for simple TF-IDF or BM25 suddenly takes too much time (around 5 minutes), while when I index using (1) the results are instant.

Mainly, my aim is to pass the "text" column as a field while indexing so that I can use the "text" field further for neural re-ranking.

Please let me know if I'm missing something.

Thanks

iknoorjobs avatar Jul 07 '22 23:07 iknoorjobs