MolVAE
MolVAE copied to clipboard
Molecule Generation and Translation Framework. This is a joint PyTorch implementation of three papers in VAE-based molecule generation and translation including JTVAE, V-JTNN-GAN, HierVAE and HierVGNN
VAE-based molecule generation and translation
This is a joint PyTorch implementation of three papers in VAE-based molecule generation and translation. The papers and the official repos are as follows:
- Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)
- Learning Multimodal Graph-to-Graph Translation for Molecular Optimization (ICLR 2019)
- Hierarchical Generation of Molecular Graphs using Structural Motifs (ICML 2020)
The master branch works with PyTorch 1.8+.
MolVAE has been tested under Python 3.7 with PyTorch 1.11 on cuda 11.4
Installation
-
Create an Anaconda environment
conda create --name vae_py37 python=3.7 conda activate vae_py37 -
Install RDKit
conda install rdkit -c rdkit -
Install PyTorch following official instructions, e.g. PyTorch on GPU platforms:
conda install pytorch torchvision -c pytorch -
Install other requirements:
pip install -r requirements.txt -
Install Chemprop (from source, additional dependency for property-guided finetuning)
git clone https://github.com/chemprop/chemprop.git cd chemprop pip install -e .
Data Format
- For molecule generation, each line of a training file is a molecule in SMILES representation.
benchmark/mosesandbenchmark/polymersare used for generation.
- For molecule translation, each line of a training file is a pair of molecules (molA, molB). The target is to translate from molA towards molB, as molB has better chemical properties.
benchmark/drd2,benchmark/logp04,benchmark/logp06andbenchmark/qedare used for translation.
Training
-
Select config file and raw data according to task and appraoch.
- For molecule generation, go to
configs/mosesorconfigs/polymers.- For junction tree approach, use
configs/*/jtvae.json. - For hierarchical substructure approach, use
configs/*/hiervae.json.
- For junction tree approach, use
- For molecule translation, go to
configs/drd2,configs/logp04,configs/logp06orconfigs/qed- For junction tree approach, according to with or without GAN loss, use
config/*/vjtnn_gan.jsonorconfigs/*/vjtnn.json - For hierarchical substructure approach, use
configs/*/hiervgnn.json
- For junction tree approach, according to with or without GAN loss, use
- For molecule generation, go to
-
Extract vocabularies from a given set of molecules and preprocess training data. Add the
--get_vocabargument if you have not extracted the vocabulary before. Replacexxxwith your selected json file.python tools/preprocess.py --config configs/xxx -
Train the model
-
Without GAN loss
python tools/train.py --config configs/xxx -
With GAN loss (only for junction tree approach for molecule translation)
python tools/train_gan.py --config configs/xxx
-
Testing
-
For molecule generation, replace
yyywith your selected model inckpt/mosesorckpt/polymers.python tools/generate.py --config configs/xxx --model ckpt/yyy -
For molecule translation, replace
yyywith your selected model inckpt/drd2,ckpt/logp04,ckpt/logp06orckpt/qed.python tools/translate.py --config configs/xxx --model ckpt/yyy
Evaluation
Calculate metrics on testing result file and replace zzz with your result file in results/*.
python tools/eval.py --config configs/xxx --result results/zzz