DiffSBDD Reproducing the paper's result

Reproducing the paper's result

Open minju-hits opened this issue 1 year ago • 0 comments

Dear, @arneschneuing

I am currently working on structured-based drug design (SBDD) and have been deeply impressed by your remarkable performance of your molecule generation work. Also, Thank you for sharing your code.

I have been attempting to reproduce the results presented in Table 1 of your paper, specifically focusing on the "CrossDocked DiffSBDD-inpaint (C-alpha)" using the provided checkpoint.

Here are the steps I followed:

Create conda environment.
Data preparation 2.1. download the CrossDocked data from the Pocket2Mol GitHub repository. 2.2. python process_crossdock.py <crossdocked_dir> --no_H
Sample molecules for all pockets in the test set. python test.py checkpoints/ca_inpaint.ckpt --test_dir <crossdocked_dir>/processed_noH/test/ --outdir <output_dir> --fix_n_nodes
Calculated the metrics with reference to your provided code.

from analysis.metrics import MoleculeProperties
mol_metrics = MoleculeProperties()
from rdkit import Chem
import glob

sdf_names = glob.glob("<output_dir>/test_set/processed/*.sdf")
pocket_mols_lst = []
for sdf_name in sdf_names:
    with Chem.SDMolSupplier(sdf_name) as suppl:
        pocket_mols = [x for x in suppl if x is not None]
    pocket_mols_lst.append(pocket_mols)

all_qed, all_sa, all_logp, all_lipinski, per_pocket_diversity = mol_metrics.evaluate(pocket_mols_lst)
print(len(pocket_mols_lst)) # 55
print([len(x) for x in pocket_mols_lst]) 
# [100, 97, 97, 93, 97, 99, 94, 98, 97, 94, 98, 98, 100, 98, 96, 97, 99, 95, 98, 98, 96, 97, 96, 96, 95, 97, 97, 97, 98, 94, 97, 97, 99, 98, 97, 98, 98, 97, 99, 99, 97, 96, 98, 99, 97, 97, 97, 99, 97, 97, 98, 92, 95, 89, 98]

My result is the below ( CrossDocked, DiffSBDD-cond (C-alpha)) and I attached my output file. testset.tar.gz

5331 molecules from 55 pockets evaluated.
QED: 0.510 \pm 0.14
SA: 0.349 \pm 0.09
LogP: -0.295 \pm 0.97
Lipinski: 4.875 \pm 0.37
Diversity: 0.774 \pm 0.07

However, I couldn't obtain the same results as those mentioned in the paper. I also looked into the related issue . Unfortunately, it didn't provide a clear answer to my question.

I have a couple of questions that I hope you could assist me with:

Could you provide the information how to accurately reproduce the results from Table 1? The 'test.py' script offers various options, and I'm uncertain about the correct settings to use in conjunction with the checkpoint to achieve the desired outcome.
The repository contains two checkpoints, yet the paper's Table 1 showcases four variation models. What are the types of variation for which you provided the two checkpoints? Additionally, could you also provide the other two checkpoints? Having access to this information would be helpful to replicate the findings.

If you require any additional information or have further questions, don't hesitate to reach out to me. Thank you for your time and consideration.

Best regards, MinJu.

Jul 25 '23 05:07 minju-hits

DiffSBDD DiffSBDD copied to clipboard

Reproducing the paper's result

DiffSBDD
DiffSBDD copied to clipboard