cedr icon indicating copy to clipboard operation
cedr copied to clipboard

data

Open wangxinzhe123 opened this issue 3 years ago • 6 comments

Because I want to run this code with other data sets, how can I get .run and .pair files similar to those in /data?

wangxinzhe123 avatar Mar 28 '22 14:03 wangxinzhe123

Hi- the format description of these files are given here: https://github.com/Georgetown-IR-Lab/cedr#getting-started

In short, training pairs are sampled from lines like [query-id] [doc-id] and run files are the standard TREC run format: [query-id] 0 [doc-id] [rank] [score] [runtag]. The latter can be the output of various retrieval systems, and the former can just be sampled from run files (depending on what you want to train with).

seanmacavaney avatar Mar 28 '22 18:03 seanmacavaney

Does the .run and .pair files need to be built manually or automatically by running some program?

wangxinzhe123 avatar Mar 29 '22 03:03 wangxinzhe123

There is also an integration plugin for CEDR using PyTerrier - see https://github.com/terrierteam/pyterrier_bert#cedr-usage (though its a little more dated compared to other PyTerrier plugins now)

cmacdonald avatar Mar 29 '22 10:03 cmacdonald

@wangxinzhe123 -- ultimately how you construct these files depends on your experimental setup. The main questions are:

  1. What results do you want CEDR to re-rank?
  2. What data do you want CEDR to sample as training data?

seanmacavaney avatar Mar 29 '22 16:03 seanmacavaney

Excuse me, can you provide the index file containing the indexbuildindex parameter?

wangxinzhe123 avatar Mar 31 '22 05:03 wangxinzhe123

That again depends on what experiment you're running -- especially since you mention that you're running it with different datasets.

Since you brought up Indri, here's documentation on it: https://sourceforge.net/p/lemur/wiki/IndriBuildIndex%20Parameters/

I'm not very familiar with Indri, however. I'm happy to help out using PyTerrier though -- especially if you provide some details on what you're trying to do. Here's the documentation on indexing: https://pyterrier.readthedocs.io/en/latest/terrier-indexing.html

seanmacavaney avatar Mar 31 '22 07:03 seanmacavaney