ranx
ranx copied to clipboard
How do we compare different runs with multiple folds per run?
How do we compare different runs with multiple folds per run?
For instance, assume we have 10-folds for run_1
, ... run_5
?
from ranx import compare
# Compare different runs and perform Two-sided Paired Student's t-Test
report = compare(
qrels=qrels,
runs=[run_1, run_2, run_3, run_4, run_5],
metrics=["map@100", "mrr@100", "ndcg@10"],
max_p=0.01 # P-value threshold
)