RankGPT different NDCG@10 score

Sorry to bother you,I have two small questions:

I noticed that your ndcg@10 score of BM25 of the NFcorpus dataset is different from Pyserini's BM25 flat score,but other datasets are consistent,is there anything special?
I have also try to construct the NDCG function instead of directly use the trec-eval,but they have different scores,do you have any ideas if you can help me look at this,really thanks! my function:

def ndcg(golden, current): # golden:original unranked order list.   current:current ranked list
    log2_table = np.log2(np.arange(2, len(golden) + 2))

    def dcg_at_n(rel, n):
        rel = np.asfarray(rel)[:n]

        dcg = np.sum(np.divide(np.power(2, rel) - 1,
                               log2_table[:rel.shape[0]]))
        return dcg

    k = len(current)
    idcg = dcg_at_n(sorted(golden, reverse=True), n=k)
    dcg = dcg_at_n(current, n=k)
    tmp_ndcg = 0 if idcg == 0 else dcg / idcg

    return tmp_ndcg

Feb 22 '24 16:02 w-y-li

Hi,

I did not perform any special processing on this data. I am currently unsure why the results are different.
This code does not seem to use the golden relevance score when calculating the DCG (dcg = dcg_at_n(current, n=k)), which could be problematic (ref. https://en.wikipedia.org/wiki/Discounted_cumulative_gain).

Feb 24 '24 21:02 sunnweiwei

Thanks for your help,however after I double check the code,I still don't find any problem with it according to the formula of DCG,I am still confuse about my code. To emphasize,the golden and current are both the list,both item of the two list are the relevance score but not the docid,and the order of the two list are the ranking-order. Really thanks for your help!

Feb 25 '24 11:02 w-y-li

Excuse me,do you have any idea?

Mar 04 '24 11:03 w-y-li

Hi, I am uncertain about the problem. Regarding the function input, golden should be the ground truth doc relevance sorted by ground truth order, and current should be the ground truth (not model-predicted) doc relevance scores sorted in the order predicted by the model. If there's no issue with the input either, then the problem might be related to the truncation of the doc list. This code (https://github.com/cvangysel/pytrec_eval/blob/master/benchmarks/native_python_vs_pytrec_eval.py) includes a Python implementation of NDCG and compares it with the C++ implementation in pytrec_eval, which might be helpful.

Mar 04 '24 16:03 sunnweiwei

RankGPT RankGPT copied to clipboard

different NDCG@10 score

RankGPT
RankGPT copied to clipboard