PaddleHelix
PaddleHelix copied to clipboard
Can I find code for LIT-PCBA dataset's 3D coordinates generation?
Hi, did you guys test GEM-2 model on LIT-PCBA by generating 3D coordinates from SMILES string?
If then, can I find a code for it?
Thank you.
Hi Sangyeup, we are organizing the training code for LIT-PCBA and will update it later. For now, you can
- Implement the LitPCBADataset class with reference to https://github.com/PaddlePaddle/PaddleHelix/blob/02cbefee527acfc979913be178d083518590da90/apps/pretrained_compound/ChemRL/GEM-2/src/dataset.py#L33
- Replace the PCQM4Mv2 dataset with the newly implemented LitPCBADataset in function load_data: https://github.com/PaddlePaddle/PaddleHelix/blob/02cbefee527acfc979913be178d083518590da90/apps/pretrained_compound/ChemRL/GEM-2/train_gem2.py#L112
- Add litpcba dataset config to the folder
configs/dataset_configs
(you need to specify where the raw litpcba dataset is like thepcqmv2.json
do) - Now you can run the
train_gem2.py
to generate the 3d data and train GEM-2 with LIT-PCBA. Note that processed data is stored in thedata_cache_dir
that you pass to the script. Hope this can be helpful to you.