CDGS icon indicating copy to clipboard operation
CDGS copied to clipboard

Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation

CDGS

Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation - AAAI 2023

The extension version: Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation [Paper] [Code].

Dependencies

  • pytorch 1.11
  • PyG 2.1

For NSPDK evaluation:

pip install git+https://github.com/fabriziocosta/EDeN.git --user

Others see requirements.txt .

Training

QM9

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_qm9_cdgs.py --mode train --workdir exp/vpsde_qm9_cdgs
  • Set GPU id via CUDA_VISIBLE_DEVICES.
  • workdir is the directory path to save checkpoints, which can be changed to YOUR_PATH. We provide the pretrained checkpoint in exp/vpsde_qm9_cdgs.
  • More hyperparameters in the config file configs/vp_qm9_cdgs.py

ZINC250k

# 256 hidden dimension
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_zinc_cdgs.py --mode train --workdir exp/vpsde_zinc_cdgs_256 --config.training.n_iters 2500000

# 128 hidden dimension
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_zinc_cdgs.py --mode train --workdir exp/vpsde_zinc_cdgs_128 --config.training.batch_size 128 --config.training.eval_batch_size 128 --config.training.n_iters 2500000

The pretrained checkpoints are provided in Google Drive 256ch and Google Drive 128ch.

Sampling

QM9

  1. EM sampling with 1000 steps
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_qm9_cdgs.py --mode eval --workdir exp/vpsde_qm9_cdgs --config.eval.begin_ckpt 200 --config.eval.end_ckpt 200
  • Add --config.eval.nspdk if apply NSPDK evaluation.
  • Change iteration steps through --config.model.num_scales YOUR_STEPS.
  • Change sampling batch size --config.eval.batch_size to control GPU memory usage.
  1. DPM-Solver examples
# Order 3; 50 step
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_qm9_cdgs.py --mode eval --workdir exp/vpsde_qm9_cdgs --config.eval.begin_ckpt 200 --config.eval.end_ckpt 200 --config.sampling.method dpm3 --config.sampling.ode_step 50

# Order 2; 20 step
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_qm9_cdgs.py --mode eval --workdir exp/vpsde_qm9_cdgs --config.eval.begin_ckpt 200 --config.eval.end_ckpt 200 --config.sampling.method dpm2 --config.sampling.ode_step 20

# Order 1; 10 step
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_qm9_cdgs.py --mode eval --workdir exp/vpsde_qm9_cdgs --config.eval.begin_ckpt 200 --config.eval.end_ckpt 200 --config.sampling.method dpm1 --config.sampling.ode_step 10

ZINC250k

  1. EM sampling examples
# 1000 steps
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_zinc_cdgs.py --mode eval --workdir exp/vpsde_zinc_cdgs_256 --config.eval.begin_ckpt 250 --config.eval.end_ckpt 250 --config.eval.batch_size 800

# 200 steps
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_zinc_cdgs.py --mode eval --workdir exp/vpsde_zinc_cdgs_256 --config.eval.begin_ckpt 250 --config.eval.end_ckpt 250 --config.eval.batch_size 800 --config.model.num_scales 200
  1. DPM-Solver examples
# Order 3; 50 step
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/vp_zinc_cdgs.py --mode eval --workdir exp/vpsde_zinc_cdgs_256 --config.eval.begin_ckpt 250 --config.eval.end_ckpt 250 --config.eval.batch_size 800 --config.sampling.method dpm3 --config.sampling.ode_step 50

Results

We provide molecules generated by CDGS: Google Drive.

Citation

@article{huang2023conditional,
  title={Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation},
  author={Huang, Han and Sun, Leilei and Du, Bowen and Lv, Weifeng},
  journal={arXiv preprint arXiv:2301.00427},
  year={2023}
}