torchdrug Pretrained Molecular Representations - Training GIN prior passing it to InfoGraph

In the Pretrained Molecular Representations tutorial GIN model was passed to InfoGraph: model = models.InfoGraph(gin_model, separate_model=False)

Should GIN be trained first, then passed to InfoGraph?

Feb 10 '22 10:02 vladimirkovacevic

Hi! You don't need to train the GIN first, since the InfoGraph itself defines a pretraining task. We wrap it with as a 'model' instead of a 'task' in TorchDrug to facilitate the interaction with other layers.

Feb 14 '22 03:02 Oxer11

Thank you for the answer. I assumed that, but how exactly are "pretrained" weights obtained since "pretrain" parameter is passed only to loading of the dataset and not to the model? dataset = datasets.ClinTox("~/molecule-datasets/", node_feature="pretrain", edge_feature="pretrain")

"pretrain" argument results in invoking features.atom.pretrain R function for calculating molecular node features in molecule.py.

Feb 14 '22 07:02 vladimirkovacevic

Hi! The arguments in the dataset refers to chemical features (e.g. atom number, formal charge), rather than anything computed by a neural network. pretrain means a specific combination of chemical features that is suggested for pretraining graph neural networks.

You may use other chemical features specifier, such as default, for pretraining. Note you need to remain the same feature specifier for training and test, otherwise the model can't recognize the input correctly.

Feb 14 '22 16:02 KiddoZhu

Hi! I am still confused about this pretrain argument. The atom representation is fixed if I use default chemical features specifier, then what's the meaning of pretrain?

Apr 01 '22 17:04 tinymd

@KiddoZhu, sorry, your last response does not address my question. So, in the Pretrained Molecular Representations example, when GIN is instantiated it has random weights, right? As such, it is passed to the InfoGraph. Setting node_feature="pretrain" to dataset object does not set weights for GIN. This does not seem to me like desired behavior. Can you please confirm or correct me if I'm wrong? Thanks!

Apr 20 '22 15:04 vladimirkovacevic

node_feature has nothing to do with the weights of the network. It only defines the attribute graph.node_feature for every graph in that dataset, which will be used as the input to the network.

For example, the default node feature is a concatenation of several chemical properties, like the one-hot encoding of atom type, the mass of the atom, the formal charge of the atom, etc. For pretraining, the pretrain node feature exactly follows the original paper, but you may also try other features. No matter which node feature you use, you need to stick to the same feature during finetuning. Otherwise, the shape of the input mismatches the network.

Apr 26 '22 02:04 KiddoZhu

torchdrug torchdrug copied to clipboard

Pretrained Molecular Representations - Training GIN prior passing it to InfoGraph

torchdrug
torchdrug copied to clipboard