DeepPurpose icon indicating copy to clipboard operation
DeepPurpose copied to clipboard

Training Configuration of pre-trained MPNN_CNN

Open pykao opened this issue 4 years ago • 12 comments
trafficstars

Hi Kexin Huang,

I am using the provided pre-trained MPNN_CNN model. When I looked into its model configuration file, it looks wired to me.

{'input_dim_drug': 1024, 'input_dim_protein': 8420, 'hidden_dim_drug': 128, 'hidden_dim_protein': 256, 'cls_hidden_dims': [1024, 1024, 512], 'batch_size': 16, 'train_epoch': 1, 'LR': 0.001, 'drug_encoding': 'MPNN', 'target_encoding': 'CNN', 'result_folder': './result/', 'binary': False, 'mpnn_hidden_size': 128, 'mpnn_depth': 3, 'cnn_target_filters': [32, 64, 96], 'cnn_target_kernels': [4, 8, 12], 'num_workers': 0, 'decay': 0}

Did you only train this model for only 1 epoch with batch size 16?

Best regards, Po-Yu Kao

pykao avatar Dec 02 '20 08:12 pykao

That's weird, I must have stored the wrong model. Let me double-check and I will upload the correct model.

kexinhuang12345 avatar Dec 02 '20 21:12 kexinhuang12345

Hey it seems this model is wrong. You can use "MPNN_CNN_BindingDB_IC50" instead. It is trained on a much larger training set (~10^5 -> 10^6) and should have higher quality. Do note that the units now switches from Kd to IC50.

kexinhuang12345 avatar Dec 03 '20 21:12 kexinhuang12345

Did you use the latest BindingDB to train this model?

pykao avatar Dec 04 '20 01:12 pykao

Hey, it is using the past version 2020m2. There should be some minor difference with the current most up to date version regarding the number of training points.

kexinhuang12345 avatar Dec 04 '20 03:12 kexinhuang12345

Thank you for your reply 👍🏽 Please let me know if you want the trained MPNN_CNN on BindingDB using Kd.

pykao avatar Dec 04 '20 06:12 pykao

No problem! Did you mean you are managed to train the model? If so, would be great to share with me ([email protected]), thanks!

kexinhuang12345 avatar Dec 04 '20 07:12 kexinhuang12345

You can simply use the model.save('XXX') function and then send me the model file; i will upload to the server and update the link, thanks again!

kexinhuang12345 avatar Dec 04 '20 07:12 kexinhuang12345

Hi Kexin, It seems that the pre-trained model MPNN_CNN downloaded using pretrained_dir = download_pretrained_model('pretrained_models') in the oneliner.py still showing the old configuration:

{'input_dim_drug': 1024, 'input_dim_protein': 8420, 'hidden_dim_drug': 128, 'hidden_dim_protein': 256, 'cls_hidden_dims': [1024, 1024, 512], 'batch_size': 16, 'train_epoch': 1, 'LR': 0.001, 'drug_encoding': 'MPNN', 'target_encoding': 'CNN', 'result_folder': './result/', 'binary': False, 'mpnn_hidden_size': 128, 'mpnn_depth': 3, 'cnn_target_filters': [32, 64, 96], 'cnn_target_kernels': [4, 8, 12]}

Maybe you need to update the model file on the https://dataverse.harvard.edu/api/access/datafile/

Maybe the configure files corresponding to pretrained_dir = download_pretrained_model('models_configs') also need a update.

chemlove avatar Jul 10 '21 06:07 chemlove

Sounds good, do you want to contribute and train a new model for it?

kexinhuang12345 avatar Jul 13 '21 14:07 kexinhuang12345

I'd like to have a try. Could you please give me the dataset of BindDB Kd? And what preproccess or data cleaning is needed before I start the train? Subsequent help may be needed since I am a complete newbie for ML :)

chemlove avatar Jul 14 '21 03:07 chemlove

Sounds good, it should be the one in the https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/Transformer%2BCNN_BindingDB.ipynb

simply replacing the model and parameter should be good

kexinhuang12345 avatar Jul 27 '21 02:07 kexinhuang12345

Thank you for your fruitful discussion and big thank-you to the developers of this library. My question is: In the latest release od DeepPurpose, was the MPNN_CNN model corrected and it works fine now?

Jameel9 avatar Sep 09 '22 04:09 Jameel9