DiffDock icon indicating copy to clipboard operation
DiffDock copied to clipboard

Unable to reproduce the experimental results of the paper

Open Xu-kexin opened this issue 2 years ago • 3 comments

Hi, thanks for making both the dataset and the code open source. I would like to ask you about the problem I encountered in reproducing the results of the diffdock implementation. The situation is that I used the same dataset and then used split time's test set for evaluation. i used the model parameters you provided under that github project, and using the same criteria in the paper, either sampling 10 times or 40 times, I got an accuracy of 21% in the range of 21% for the top1 with an rmsd less than 2. I repeated the test several times and the structure is in the 20-22% range. This matches the results of PDBBind docking on unseen receptors in your paper, but I think I should be comparing the results in the main text which are in the order of magnitude of 38%. So I would like to ask is what command should I change to get the result of the order of magnitude 38% in the paper? Or should I use the test set in splittime and then follow your instructions and use the evaluate command to get the test results of what is in the paper? Again, thank you very much for your excellent work and I very much look forward to hearing from you.

Xu-kexin avatar Sep 12 '23 09:09 Xu-kexin

Hi @Xu-kexin Are you sure that you are looking at the filtered_rmsd_below_2 metric? The rmsd_below_2 metric only indicates the performance before the use of the confidence model, which makes a selection between the 10 or 40 samples.

gcorso avatar Sep 12 '23 23:09 gcorso

Thank you! So only "filtered" is the result after the confidence model, yes it is around 30% much higher than my result before. But I also see "top5_rmsds_below_2", what is this? Also, what does no overlap mean? Thank you very much!

Xu-kexin avatar Sep 13 '23 10:09 Xu-kexin

top5_rmsds_below_2 corresponds to the the proportion of complexes with RMSD below 2A when taking the pose with lowest RMSD out of 5 poses (the 5 highest ranked poses in the case of top5_rmsds_below_2). no_overlap refers to the portion of the dataset with proteins that were not seen in the training set (see the appendix of the paper)

gcorso avatar Oct 09 '23 00:10 gcorso