PaddleHelix icon indicating copy to clipboard operation
PaddleHelix copied to clipboard

Molecular descriptor/fingerprint?

Open miquelduranfrigola opened this issue 3 years ago • 1 comments

Hello, thanks for a wonderful repository. I wonder if it is possible to extract, given a pre-trained network, a "fingerprint" or "descriptor" for each input molecule. Thanks! M

miquelduranfrigola avatar Feb 17 '22 16:02 miquelduranfrigola

Hi, miquelduranfrigola. Thank you for using PaddleHelix. Does the 'fingerprint' or 'descriptor' mean that the final molecule representation learnt by the model? If so, you can get that by adding several lines to the code. Let's use the gem model as an example(suppose you are using GeoPredModel to train and GeoGNN model as its compound encoder). In 156: https://github.com/PaddlePaddle/PaddleHelix/blob/3368b93fc706dd3fea35887748673abcc668c145/pahelix/model_zoo/gem_model.py#L156 the GeoGNN model returns the graph representation, and that might be the 'fingerprint' or 'descriptor' you want. Therefore, you can replace the line https://github.com/PaddlePaddle/PaddleHelix/blob/3368b93fc706dd3fea35887748673abcc668c145/pahelix/model_zoo/gem_model.py#L283 with return loss, graph_repr to get the representation of the molecule of current batch. Then you can use some code to collect all batchs' representation and you will get the representation of every molecule in the dataset you are using. Hope this can be helpful to you.

Noisyntrain avatar Mar 08 '22 12:03 Noisyntrain

Hello @Noisyntrain - apologies for my late reply. We are working on this at the moment, thanks for your help! I will let you know if we encounter any problems, but your solution looks neat!

miquelduranfrigola avatar Nov 21 '22 19:11 miquelduranfrigola