IR-Reproducibility
IR-Reproducibility copied to clipboard
Separate index and retrieval into scripts, with named "index types" and "runs"
Following on from the discussion about the separation of scripts, my proposal is to avoid make and use some trivial shell scripting. Each system must provide three files:
- index.sh
- retrieve.sh
- runs
The latter contains a column of supported indexing type "names", and a column of retrieval "run names" (tab separated).
For example terrier/runs might contain:
classical DPH
classical-blocks DPH-proximity
classical DPH-QE
The harness script would simply do loop through each of the index types for the system, then run each of the runs for each of the topic sets. A rough sketch would be as follows:
#!/bin/bash
SYSTEM=$1
mkdir -p runs
for indexForm in `cat $SYSTEM/runs | awk '{print $1}'`;
do
$SYSTEM/index.sh $indexForm
for run in `cat $SYSTEM/runs | grep -P'^$indexForm\t'| awk '{print $2}'`;
do
for topics in "701-750" "751-800" "801-850"; do
runfile=runs/$SYSTEM-$indexForm-$run.$topics.res
$SYSTEM/retrieve.sh $topics $runfile
$TREC_EVAL $qrels $runfile
done
done
done