PS-VAE
PS-VAE copied to clipboard
Train model on property other than LogP and QED
Hi, thanks so much for developing this amazing tool! I would really appreciate it if you could help me with an issue when I tried to apply it. In the src/train.py, line 90, there are five choices provided for property model training, QED, SA, logP, GSK3B and JNK3. However, an error '' Invalid Property" raised when I assigned 'gsk3b' as the props parameter. I wonder how I could enable the model to train in this property, or if there is some possible way to assign customized property for the training process. Thanks so much! error_props.txt
Hi, I think the following steps can help you train a model on a custom property. Suppose the property is GSK3B:
- Implement get_gsk3b(molecule) in src/evaluation/utils.py. Here the argument molecule is a RDKit Molecule object, so this function calculates the gsk3b score given the molecule.
- Register the function in eval_funcs_dict(), e.g. add an entry in the dict {'gsk3b': get_gsk3b}. This tells the framework to use the function get_gsk3b to calculate the score for the property "gsk3b".
- Append the normlized score in get_normalized_property_score() and the restored score in restore_property_score(). The first function normalizes the property scores to approximately standard Gaussian (i.e. N(0,1)) which is better for regression. The second function restore the original scale from the predicted results. But if the score is already in [0, 1], I think you can just implement them as identity mapping (i.e. f(x) = x).
- Add threshold in PROP_TH. When the scores of the generated molecules are above the thresholds, they are regarded as successful optimization. Add the property name in PROPS.
- Add the property name to the choices in train.py.
- Now you can start training!