torchdrug icon indicating copy to clipboard operation
torchdrug copied to clipboard

Pretrained Molecular Representations - Training GIN prior passing it to InfoGraph

Open vladimirkovacevic opened this issue 3 years ago • 6 comments

In the Pretrained Molecular Representations tutorial GIN model was passed to InfoGraph: model = models.InfoGraph(gin_model, separate_model=False)

Should GIN be trained first, then passed to InfoGraph?

vladimirkovacevic avatar Feb 10 '22 10:02 vladimirkovacevic

Hi! You don't need to train the GIN first, since the InfoGraph itself defines a pretraining task. We wrap it with as a 'model' instead of a 'task' in TorchDrug to facilitate the interaction with other layers.

Oxer11 avatar Feb 14 '22 03:02 Oxer11

Thank you for the answer. I assumed that, but how exactly are "pretrained" weights obtained since "pretrain" parameter is passed only to loading of the dataset and not to the model? dataset = datasets.ClinTox("~/molecule-datasets/", node_feature="pretrain", edge_feature="pretrain")

"pretrain" argument results in invoking features.atom.pretrain R function for calculating molecular node features in molecule.py.

vladimirkovacevic avatar Feb 14 '22 07:02 vladimirkovacevic

Hi! The arguments in the dataset refers to chemical features (e.g. atom number, formal charge), rather than anything computed by a neural network. pretrain means a specific combination of chemical features that is suggested for pretraining graph neural networks.

You may use other chemical features specifier, such as default, for pretraining. Note you need to remain the same feature specifier for training and test, otherwise the model can't recognize the input correctly.

KiddoZhu avatar Feb 14 '22 16:02 KiddoZhu

Hi! I am still confused about this pretrain argument. The atom representation is fixed if I use default chemical features specifier, then what's the meaning of pretrain?

tinymd avatar Apr 01 '22 17:04 tinymd

@KiddoZhu, sorry, your last response does not address my question. So, in the Pretrained Molecular Representations example, when GIN is instantiated it has random weights, right? As such, it is passed to the InfoGraph. Setting node_feature="pretrain" to dataset object does not set weights for GIN. This does not seem to me like desired behavior. Can you please confirm or correct me if I'm wrong? Thanks!

vladimirkovacevic avatar Apr 20 '22 15:04 vladimirkovacevic

node_feature has nothing to do with the weights of the network. It only defines the attribute graph.node_feature for every graph in that dataset, which will be used as the input to the network.

For example, the default node feature is a concatenation of several chemical properties, like the one-hot encoding of atom type, the mass of the atom, the formal charge of the atom, etc. For pretraining, the pretrain node feature exactly follows the original paper, but you may also try other features. No matter which node feature you use, you need to stick to the same feature during finetuning. Otherwise, the shape of the input mismatches the network.

KiddoZhu avatar Apr 26 '22 02:04 KiddoZhu