IR-Reproducibility icon indicating copy to clipboard operation
IR-Reproducibility copied to clipboard

Separate index and retrieval into scripts, with named "index types" and "runs"

Open cmacdonald opened this issue 9 years ago • 0 comments

Following on from the discussion about the separation of scripts, my proposal is to avoid make and use some trivial shell scripting. Each system must provide three files:

  • index.sh
  • retrieve.sh
  • runs

The latter contains a column of supported indexing type "names", and a column of retrieval "run names" (tab separated).

For example terrier/runs might contain:

classical   DPH
classical-blocks    DPH-proximity
classical   DPH-QE

The harness script would simply do loop through each of the index types for the system, then run each of the runs for each of the topic sets. A rough sketch would be as follows:

#!/bin/bash
SYSTEM=$1
mkdir -p runs

for indexForm in `cat $SYSTEM/runs | awk '{print $1}'`;
do
    $SYSTEM/index.sh $indexForm
    for run in `cat $SYSTEM/runs | grep  -P'^$indexForm\t'| awk '{print $2}'`;
    do
        for topics in "701-750" "751-800" "801-850";        do
            runfile=runs/$SYSTEM-$indexForm-$run.$topics.res
            $SYSTEM/retrieve.sh $topics $runfile 
            $TREC_EVAL $qrels $runfile
        done 
    done
done

cmacdonald avatar Jun 17 '15 09:06 cmacdonald