PaddleHelix
PaddleHelix copied to clipboard
Molecular descriptor/fingerprint?
Hello, thanks for a wonderful repository. I wonder if it is possible to extract, given a pre-trained network, a "fingerprint" or "descriptor" for each input molecule. Thanks! M
Hi, miquelduranfrigola. Thank you for using PaddleHelix. Does the 'fingerprint' or 'descriptor' mean that the final molecule representation learnt by the model? If so, you can get that by adding several lines to the code.
Let's use the gem model as an example(suppose you are using GeoPredModel to train and GeoGNN model as its compound encoder). In 156: https://github.com/PaddlePaddle/PaddleHelix/blob/3368b93fc706dd3fea35887748673abcc668c145/pahelix/model_zoo/gem_model.py#L156
the GeoGNN model returns the graph representation, and that might be the 'fingerprint' or 'descriptor' you want. Therefore, you can replace the line https://github.com/PaddlePaddle/PaddleHelix/blob/3368b93fc706dd3fea35887748673abcc668c145/pahelix/model_zoo/gem_model.py#L283 with return loss, graph_repr
to get the representation of the molecule of current batch. Then you can use some code to collect all batchs' representation and you will get the representation of every molecule in the dataset you are using.
Hope this can be helpful to you.
Hello @Noisyntrain - apologies for my late reply. We are working on this at the moment, thanks for your help! I will let you know if we encounter any problems, but your solution looks neat!