TimeX
TimeX copied to clipboard
Unable to reproduce paper results
Thanks a lot for the effort to release the code-base. I am trying to reproduce the results from the paper, however I am finding lower performance that what was reported in the paper on most of the datasets and I am wondering whether this is a variance problem to do with seed selection ? Were the reported results ran over a single seed?
In particular, I am having issues reproducing SeqCombMV
where the performance is significantly lower (even than the baselines IG and Dynamask). I get the following results when running the model on this:
Results for ours explainer on seqcomb_mv with split=1
auprc = 0.2960 +- 0.0023
aup = 0.7468 +- 0.0020
aur = 0.3036 +- 0.0021
iou = 0.1143 +- 0.0013
Results for ours explainer on seqcomb_mv with split=2
auprc = 0.1231 +- 0.0039
aup = 0.0888 +- 0.0022
aur = 0.5560 +- 0.0042
iou = 0.0584 +- 0.0028
Results for ours explainer on seqcomb_mv with split=3
auprc = 0.7016 +- 0.0038
aup = 0.7407 +- 0.0015
aur = 0.4463 +- 0.0020
iou = 0.3340 +- 0.0028
Results for ours explainer on seqcomb_mv with split=4
auprc = 0.2680 +- 0.0031
aup = 0.7546 +- 0.0034
aur = 0.1154 +- 0.0023
iou = 0.1375 +- 0.0020
Results for ours explainer on seqcomb_mv with split=5
auprc = 0.0812 +- 0.0021
aup = 0.0551 +- 0.0015
aur = 0.4215 +- 0.0067
iou = 0.0384 +- 0.0022
Results for ours explainer on all splits
auprc = 0.2940 +- 0.0039
aup = 0.4772 +- 0.0055
aur = 0.3685 +- 0.0030
iou = 0.1365 +- 0.0020
And this is what was reported in the paper:
AUPRC AUP AUR
0.6878±0.0021 0.8326±0.0008 0.3872±0.0015
I double checked the hyperparameters as well. But is it possible that there is a problem with the generated data, or some error in the hyperparameter?
Thanks a lot for your help in advance!