pyterrier
pyterrier copied to clipboard
Setting fields in dataframe indexing
Hi,
I am struggling to set fields (which is required for most of the neural approaches like monoT5 in pyterrier) while indexing using the Pandas dataframe of my own dataset. I have tried all the methods mentioned at https://github.com/terrier-org/pyterrier/blob/master/examples/notebooks/indexing.ipynb for setting fields. These include,
1. pd_indexer.index(df["text"], df["docno"])
2. pd_indexer.index(df["text"], df["docno"], df["additional_field"])
3. pd_indexer.index(df["text"], docno= df["docno"], additional_field=df["additional_field"])
4. pd_indexer.index(df["text"], df)
Only (1) works but I guess it does not has "additional_field". I have also noticed when we use (2), (3) and (4) method, the results for simple TF-IDF or BM25 suddenly takes too much time (around 5 minutes), while when I index using (1) the results are instant.
Mainly, my aim is to pass the "text" column as a field while indexing so that I can use the "text" field further for neural re-ranking.
Please let me know if I'm missing something.
Thanks