Sean MacAvaney
Sean MacAvaney
Hi- the format description of these files are given here: https://github.com/Georgetown-IR-Lab/cedr#getting-started In short, training pairs are sampled from lines like `[query-id] [doc-id]` and run files are the standard TREC run...
@wangxinzhe123 -- ultimately how you construct these files depends on your experimental setup. The main questions are: 1) What results do you want CEDR to re-rank? 2) What data do...
That again depends on what experiment you're running -- especially since you mention that you're running it with different datasets. Since you brought up Indri, here's documentation on it: https://sourceforge.net/p/lemur/wiki/IndriBuildIndex%20Parameters/...
Thanks for the report. I'm not able to reproduce it when following the instructions provided by the software: Specifically: When requesting scoreddocs of `msmarco-passage/dev/small`, I get the following message as...
Thanks! I suspect it's this issue: https://github.com/allenai/ir_datasets/issues/151 There's a branch that fixes it, but for some reason, it hasn't been merged into the main branch: https://github.com/allenai/ir_datasets/tree/encoding-fixes I'll look into merging...
It also looks like the `FixEncoding` module was bypassed, which is why you're getting all the characters like `—`. (`FixEncoding` replaces them with their correct unicode versions.) As with #209,...
Hi @yuenherny -- it looks like this is a different issue. Do you have multiple processes open using ir_datasets? (E.g., multiple notebook instances)? As files are downloading, only a single...
> and when one hits an error, the process isn't closed automatically Gotcha -- thanks! This is a bug, as it should close the file in this case so others...
Starting on this. Here's a list of all `NamedTuple`s for queries and docs: ``` [x] ir_datasets/datasets/aol_ia.py: AolIaDoc [x] ir_datasets/datasets/beir.py: BeirDoc [x] ir_datasets/datasets/beir.py: BeirTitleDoc [x] ir_datasets/datasets/beir.py: BeirTitleUrlDoc [ ] ir_datasets/datasets/beir.py: BeirSciDoc...
Thanks for bumping this PR @heinrichreimer, and thanks @grodino for the contribution!