dgl-lifesci icon indicating copy to clipboard operation
dgl-lifesci copied to clipboard

Question: what datasets were pre-trained models pre-trained on?

Open rhjohnstone opened this issue 2 years ago • 1 comments

Some of the pre-trained models are just described as "pre-trained", while others are described as "pre-trained then fine-tuned on x". What data was the original pre-trained performed on, and for how long?

e.g. from the docs:

'gin_supervised_contextpred': A GIN model pre-trained with supervised learning and context prediction 'gin_supervised_masking_BACE': A GIN model pre-trained with supervised learning and masking, and fine-tuned on BACE

rhjohnstone avatar Oct 20 '22 07:10 rhjohnstone

You may find the details of pre-training in https://arxiv.org/abs/1905.12265. supervised means supervised pre-training on a ChEMBL dataset was performed. contextpred means self-supervised pre-training with context prediction on a ZINC15 dataset was performed.

mufeili avatar Oct 22 '22 08:10 mufeili