IGNN Unjustified evaluation on multi-label datasets

Unjustified evaluation on multi-label datasets

Open nowyouseemejoe opened this issue 1 year ago • 0 comments

First of all, thank you for your work.

I found that the evaluation for the multi-label dataset I(e.g., IMDB) is unreasonable and unjustified, which leads your method to the state-of-the-art performance.

To be specific, it is unresonable to give the binary_pred with prior knowledge of how many classes for each node during evaluation. It's unfair for other baselines. This looks like an dirty trick!

    for i in range(preds.shape[0]):
        k = labels[i].sum().astype('int')
        topk_idx = preds[i].argsort()[-k:]
        binary_pred[i][topk_idx] = 1
        for pos in list(labels[i].nonzero()[0]):
            if labels[i][pos] and labels[i][pos] == binary_pred[i][pos]:
                num_correct += 1

In fact, it is usually to use metrics.f1_score(labels, preds>0) for evaluation.

Don't you think it is unfair for the other published and existing papers? Everyone is racing to compete, you cannot take a rocket to increase your scores by changing evaluation settings for your method. This is harmful for future research in this field and you are blocking the exploration of innovative state-of-the-art methods.

Oct 19 '23 09:10 nowyouseemejoe

IGNN IGNN copied to clipboard

Unjustified evaluation on multi-label datasets

IGNN
IGNN copied to clipboard