cdvae
cdvae copied to clipboard
Instructions for use of the benchmark datasets and metrics on custom generative models
Hi @txie-93, I'm enjoying digging into the manuscript, and congratulations on its acceptance to ICLR! It is really nice to see the comparison with FTCP and other methods, and CDVAE certainly has some impressive results.
Would you mind providing some instructions in the repository for using the benchmark datasets and the metrics on a custom generative model? For example, how would this look for FTCP or the slew of other generative models in this space (i.e. the general inverse design ones)?
Hi Sterling, thank you for your interest.
Use our datasets on other models
Our datasets are csv
files where each row contains a crystal with its cif
string. Both FTCP and Cond-DFC-VAE have utilities and guideline to configure the model for cif data source. Maybe that is also the case for other crystal generative models, given the common use of cif for crystals?
To adopt G-SchNet, we processed the crystals to ASE atoms. I am enclosing my code here:
import numpy as np
import pandas as pd
from pathlib import Path
from tqdm import tqdm
from ase import Atoms
from pymatgen.core.structure import Structure
from pymatgen.core.lattice import Lattice
def abs_cap(val, max_abs_val=1):
return max(min(val, max_abs_val), -max_abs_val)
def lattice_params_to_matrix(a, b, c, alpha, beta, gamma):
angles_r = np.radians([alpha, beta, gamma])
cos_alpha, cos_beta, cos_gamma = np.cos(angles_r)
sin_alpha, sin_beta, sin_gamma = np.sin(angles_r)
val = (cos_alpha * cos_beta - cos_gamma) / (sin_alpha * sin_beta)
# Sometimes rounding errors result in values slightly > 1.
val = abs_cap(val)
gamma_star = np.arccos(val)
vector_a = [a * sin_beta, 0.0, a * cos_beta]
vector_b = [
-b * sin_alpha * np.cos(gamma_star),
b * sin_alpha * np.sin(gamma_star),
b * cos_alpha,
]
vector_c = [0.0, 0.0, float(c)]
return np.array([vector_a, vector_b, vector_c])
def build_crystal(crystal_str, niggli=True, primitive=False, supercell=False):
"""Build crystal from cif string."""
crystal = Structure.from_str(crystal_str, fmt='cif')
if primitive:
crystal = crystal.get_primitive_structure()
if niggli:
crystal = crystal.get_reduced_structure()
canonical_crystal = Structure(
lattice=Lattice.from_parameters(*crystal.lattice.parameters),
species=crystal.species,
coords=crystal.frac_coords,
coords_are_cartesian=False,
)
return canonical_crystal
def get_ase_atoms(cif):
crystal = build_crystal(cif)
lattice = lattice_params_to_matrix(*crystal.lattice.abc, *crystal.lattice.angles)
at = Atoms(scaled_positions=crystal.frac_coords,
numbers=np.array(crystal.atomic_numbers),
cell=lattice, pbc=True)
return at
and then one can follow the instruction to build dataset objects for G-SchNet.
Use our benchmark metrics
A dictionary containing the following is all you need for evaluation:
frac_coords
: fractional coordinates of each atom, shape (num_evals, N, 3)
atom_types
: atomic number of each atom, shape (num_evals, N)
lengths
: the lengths of the lattice, shape (num_evals, M, 3)
angles
: the angles of the lattice, shape (num_evals, M, 3)
num_atoms
: the number of atoms in each material, shape (num_evals, M)
Any crystal generative models would generate these quantities to be complete.
Our evaluation scripts for computing metrics are independent of CDVAE. One just need to save these quantities as a torch pickle file, and then run compute_metrics.py with that file as input. See https://github.com/txie-93/cdvae/blob/main/scripts/compute_metrics.py#L267 on how the saved crystals are loaded.
Hope this helps.
@kyonofx thank you! As I was browsing further, also noticed the README in the data directory. I appreciate the extra clarification here.
Our evaluation scripts for computing metrics are independent of CDVAE.
@kyonofx while the scripts are in separate files/folders from cdvae, there are import dependencies that trace back to CDVAE: https://github.com/txie-93/cdvae/blob/f857f598d6f6cca5dc1ea0582d228f12dcc2c2ea/scripts/compute_metrics.py#L19-L21
https://github.com/txie-93/cdvae/blob/f857f598d6f6cca5dc1ea0582d228f12dcc2c2ea/scripts/eval_utils.py#L15-L18
Hi,
Yes, you still need to install the cdvae package, and evaluation can be run without training a cdvae model.
@kyonofx I'm planning to expose these metrics in their own package plus additional metric(s). Would you recommend that I try to splice out the functionality or package CDVAE as a whole onto PyPI and Anaconda? #14
Hi,
I think it might be the easiest to splice out the evaluation code as they only compose a small fraction of the cdvae codebase.
@kyonofx separating it out is turning out to be ☠️ I'm reconstructing most of the repository piece by piece. Not very straightforward, as it accesses many files in the repository.
Note that this is mainly due to some metric(s) requiring predictions from a CDVAE submodel (i.e. property regressor).