ranx
ranx copied to clipboard
[Question] How to compute precision for a retriever operating at passage-level
If the retriever is operating at passage / chunk-level and hence the retrieved results can have duplicate IDs as shown below: Top-10 retrieval list for each of the queries:
q_1: ['d_1', 'd_1', 'd_1', 'd_1', 'd_1', 'd_1', 'd_5', 'd_5', 'd_5', 'd_5']
q_2: ['d_4', 'd_4', 'd_4', 'd_4', 'd_2', 'd_2', 'd_6', 'd_6', 'd_6', 'd_6']
Encoding them into a dictionary results in:
run_dict = { "q_1": { "d_1": 0.9, "d_5": 0.8 },
"q_2": { "d_4": 0.9, "d_2": 0.8, "d_6": 0.7 } }
where qrels could be:
qrels_dict = { "q_1": { "d_1": 5, "d_5": 3 },
"q_2": { "d_4": 6, "d_6": 1 } }
Considering above scenario, the precision@10
would yield very low score, even though it is not the case. How can we fix this issue?