PADME icon indicating copy to clipboard operation
PADME copied to clipboard

Training data creation

Open abhisekbakshi opened this issue 4 years ago • 1 comments

Dear Sir, I could not understand how the SMILES format is converted to ECFP to feed in the input layer of the model. Moreover, I could not understand how you have calculated the known binding affinity score for training samples. Please suggest me a way to understand these.

abhisekbakshi avatar Apr 04 '20 10:04 abhisekbakshi

The SMILES format is converted to RDKit Mol object and then converted to ECFP (in this case, Morgan Fingerprint, which is nearly identical to ECFP) in this line: https://github.com/simonfqy/PADME/blob/e01c592cc06c4de04b3ed6db35da5af5ff7f863f/dcCustom/feat/fingerprints.py#L23. As for the binding affinity scores, I obtained the info from some publicly available datasets. They are then processed in thepreprocess.py files in each dataset folder, like here: https://github.com/simonfqy/PADME/blob/e01c592cc06c4de04b3ed6db35da5af5ff7f863f/davis_data/preprocess.py#L35. The log transformation is done in the same file.

simonfqy avatar Apr 10 '20 22:04 simonfqy