TimeX icon indicating copy to clipboard operation
TimeX copied to clipboard

Unable to reproduce paper results

Open melfm opened this issue 5 months ago • 0 comments

Thanks a lot for the effort to release the code-base. I am trying to reproduce the results from the paper, however I am finding lower performance that what was reported in the paper on most of the datasets and I am wondering whether this is a variance problem to do with seed selection ? Were the reported results ran over a single seed?

In particular, I am having issues reproducing SeqCombMV where the performance is significantly lower (even than the baselines IG and Dynamask). I get the following results when running the model on this:

Results for ours explainer on seqcomb_mv with split=1
	auprc 	 = 0.2960 +- 0.0023
	aup 	 = 0.7468 +- 0.0020
	aur 	 = 0.3036 +- 0.0021
	iou 	 = 0.1143 +- 0.0013
Results for ours explainer on seqcomb_mv with split=2
	auprc 	 = 0.1231 +- 0.0039
	aup 	 = 0.0888 +- 0.0022
	aur 	 = 0.5560 +- 0.0042
	iou 	 = 0.0584 +- 0.0028
Results for ours explainer on seqcomb_mv with split=3
	auprc 	 = 0.7016 +- 0.0038
	aup 	 = 0.7407 +- 0.0015
	aur 	 = 0.4463 +- 0.0020
	iou 	 = 0.3340 +- 0.0028
Results for ours explainer on seqcomb_mv with split=4
	auprc 	 = 0.2680 +- 0.0031
	aup 	 = 0.7546 +- 0.0034
	aur 	 = 0.1154 +- 0.0023
	iou 	 = 0.1375 +- 0.0020
Results for ours explainer on seqcomb_mv with split=5
	auprc 	 = 0.0812 +- 0.0021
	aup 	 = 0.0551 +- 0.0015
	aur 	 = 0.4215 +- 0.0067
	iou 	 = 0.0384 +- 0.0022
Results for ours explainer on all splits
	auprc 	 = 0.2940 +- 0.0039
	aup 	 = 0.4772 +- 0.0055
	aur 	 = 0.3685 +- 0.0030
	iou 	 = 0.1365 +- 0.0020

And this is what was reported in the paper:

AUPRC AUP AUR
0.6878±0.0021 0.8326±0.0008 0.3872±0.0015

I double checked the hyperparameters as well. But is it possible that there is a problem with the generated data, or some error in the hyperparameter?

Thanks a lot for your help in advance!

melfm avatar Sep 21 '24 00:09 melfm