IGNN
IGNN copied to clipboard
Unjustified evaluation on multi-label datasets
First of all, thank you for your work.
I found that the evaluation for the multi-label dataset I(e.g., IMDB) is unreasonable and unjustified, which leads your method to the state-of-the-art performance.
To be specific, it is unresonable to give the binary_pred
with prior knowledge of how many classes for each node during evaluation. It's unfair for other baselines. This looks like an dirty trick!
for i in range(preds.shape[0]):
k = labels[i].sum().astype('int')
topk_idx = preds[i].argsort()[-k:]
binary_pred[i][topk_idx] = 1
for pos in list(labels[i].nonzero()[0]):
if labels[i][pos] and labels[i][pos] == binary_pred[i][pos]:
num_correct += 1
In fact, it is usually to use metrics.f1_score(labels, preds>0)
for evaluation.
Don't you think it is unfair for the other published and existing papers? Everyone is racing to compete, you cannot take a rocket to increase your scores by changing evaluation settings for your method. This is harmful for future research in this field and you are blocking the exploration of innovative state-of-the-art methods.