PADME
PADME copied to clipboard
Training data creation
Dear Sir, I could not understand how the SMILES format is converted to ECFP to feed in the input layer of the model. Moreover, I could not understand how you have calculated the known binding affinity score for training samples. Please suggest me a way to understand these.
The SMILES format is converted to RDKit Mol object and then converted to ECFP (in this case, Morgan Fingerprint, which is nearly identical to ECFP) in this line: https://github.com/simonfqy/PADME/blob/e01c592cc06c4de04b3ed6db35da5af5ff7f863f/dcCustom/feat/fingerprints.py#L23.
As for the binding affinity scores, I obtained the info from some publicly available datasets. They are then processed in thepreprocess.py
files in each dataset folder, like here: https://github.com/simonfqy/PADME/blob/e01c592cc06c4de04b3ed6db35da5af5ff7f863f/davis_data/preprocess.py#L35. The log transformation is done in the same file.