Problem running otf_train.yaml (error message sparse_gp.py)
Describe the bug I was running flare-otf otf_train.yaml command using a POSCAR file as input structure file and VASP as DFT calculator, when getting this error message:
File "/home/USER/.local/lib/python3.10/site-packages/flare/bffs/sgp/sparse_gp.py", line 335, in update_db coded_species.append(self.species_map[spec]) KeyError: 14
To Reproduce Steps to reproduce the behavior:
- otf_train.yaml
Super cell is read from a file such as POSCAR, xyz, lammps-data
or any format that ASE supports
supercell: file: POSCAR format: vasp replicate: [1, 1, 1] # supercell creation. Be mindful of DFT limitations and periodicity of your cell. jitter: 0.1 # perturb the initial atomic positions by 0.1 A, so initial atomic environments added to the sparse set are not the same
Set up FLARE calculator with (sparse) Gaussian process
flare_calc: gp: SGP_Wrapper kernels: - name: NormalizedDotProduct # select kernel for comparison of atomic environments sigma: 2.0 # signal variance, this hyperparameter will be trained, and is typically between 1 and 10. power: 2 # power of the kernel, influences body-order descriptors: - name: B2 # Atomic Cluster Expansion (ACE) descriptor from R. Drautz (2019). FLARE can only go from B1 up to B3 currently. nmax: 8 # Radial fidelity of the descriptor (higher value = higher cost) lmax: 3 # Angular fidelity of the descriptor (higher value = higher cost) cutoff_function: quadratic # Cutoff behavior radial_basis: chebyshev # Formalism for the radial basis functions cutoff_matrix: [[5.0]] # In angstroms. NxN array for N_species in a system. energy_noise: 0.096 # Energy noise hyperparameter, will be trained later. Typically set to 1 meV * N_atoms. forces_noise: 0.05 # Force noise hyperparameter, will be trained later. System dependent, typically between 0.05 meV/A and 0.2 meV/A. stress_noise: 0.001 # Stress noise hyperparameter, will be trained later. Typically set to 0.001 meV/A^3. energy_training: True force_training: True stress_training: True species: - 13 # Atomic number of your species (here, 13 = Al). single_atom_energies: - 0 # Single atom energies to bias the energy prediction of the model. Can help in systems with poor initial energy estimations. Length must equal the number of species. cutoff: 5.0 # Cutoff for the (ACE) descriptor. Typically informed by the radial distribution function of the system. Should equal the maximum value in the cutoff_matrix. variance_type: local # Calculate atomic uncertainties. max_iterations: 20 # Maximum steps taken during each hyperparameter optimization call. use_mapping: True # Print mapped model (ready for use in LAMMPS) during trajectory. Model is re-mapped and replaced if new DFT calls are made throughout the trajectory.
In the tutorial, we use ASE Lennard-Jones potential as ground truth
instead of DFT to save time
dft_calc: name: Vasp kwargs: command: "mpirun vasp_std" # pseudo-potential xc: pbe # k points kpts: [4, 4, 4] # INCAR istart: 0 npar: 8 ediff: 1.0e-6 encut: 500 ismear: -5 sigma: 0.2 lreal: Auto prec: Accurate algo: Fast lscalapack: False params: {}
Set up On-the-fly training and MD
otf: # On-the-fly training and MD
mode: fresh # Start from an empty SGP
md_engine: VelocityVerlet # Define MD engine, here we use the Velocity Verlet engine from ASE. LAMMPS examples can be found in the flare/examples directory in the repo
md_kwargs: {} # Define MD kwargs
initial_velocity: 1000 # Initialize the velocities
dt: 0.001 # Set the time step in picoseconds (1 fs here)
number_of_steps: 10 # Total number of MD steps to be taken
output_name: Si_otf # Name of output
init_atoms: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] # Initial atoms to be added to the sparse set
std_tolerance_factor: -0.01 # The uncertainty threshold above which the DFT will be called
max_atoms_added: -1 # Allow for all atoms in a given frame to be added to the sparse set if uncertainties permit
train_hyps: [5,inf] # Define range in which hyperparameters will be optimized. Here, hyps are optimized at every DFT call after the 5th call.
write_model: 4 # Verbosity of model output.
update_style: threshold # Sparse set update style. Atoms above a defined "threshold" will be added using this method
update_threshold: 0.001 # Threshold for adding atoms if "update_style = threshold". Threshold represents relative uncertainty to mean atomic uncertainty, where atoms above are added to sparse set
force_only: False # Train on forces, stresses, and energies.
- POSCAR
Si 1.0000000000000000 5.4437023729394527 0.0000000000000000 0.0000000000000003 0.0000000000000009 5.4437023729394527 0.0000000000000003 0.0000000000000000 0.0000000000000000 5.4437023729394527 Si 8 Cartesian 4.1147590257602102 4.2452943034050890 1.3468254251531135 -0.0802625722806385 2.7244722198520734 2.8298858824873951 4.0422699347878837 1.3440468765613307 4.0661853445649534 0.0494231634513068 -0.0609311205725576 0.1084518318735196 1.3544414172112398 4.0951079607678640 4.0422043460588535 2.6399530632153270 2.7130357906305003 0.0432045106348295 1.4549697967372583 1.5227723112914378 1.3071579071337061 2.8919919707496526 -0.0405562349093180 2.8027838013455884
- version flare
git clone https://github.com/mir-group/flare.git (latest release 1.3.3)
Hello,
The error you are seeing is due to a mismatch in the species listed in the flare_calc section of your yaml and the structure you are reading. You need to modify the following (assuming your input file only contains Si):
#old species:
- 13 # Atomic number of your species (here, 13 = Al).
#new species:
-
14 # Atomic number of your species (here, 14 = Si).
-
Cameron
Thank you, it worked! If I have a system with different species (for example Si=14 and O=8), how does my otf_train.yaml looks like (I tried different ways, but always got some error messages)?
Thank you!
To Reproduce
- otf_train.yaml
Super cell is read from a file such as POSCAR, xyz, lammps-data
or any format that ASE supports
supercell: file: POSCAR format: vasp replicate: [1, 1, 1] # supercell creation. Be mindful of DFT limitations and periodicity of your cell. jitter: 0.1 # perturb the initial atomic positions by 0.1 A, so initial atomic environments added to the sparse set are not the same
Set up FLARE calculator with (sparse) Gaussian process
flare_calc: gp: SGP_Wrapper kernels: - name: NormalizedDotProduct # select kernel for comparison of atomic environments sigma: 2.0 # signal variance, this hyperparameter will be trained, and is typically between 1 and 10. power: 2 # power of the kernel, influences body-order descriptors: - name: B2 # Atomic Cluster Expansion (ACE) descriptor from R. Drautz (2019). FLARE can only go from B1 up to B3 currently. nmax: 8 # Radial fidelity of the descriptor (higher value = higher cost) lmax: 3 # Angular fidelity of the descriptor (higher value = higher cost) cutoff_function: quadratic # Cutoff behavior radial_basis: chebyshev # Formalism for the radial basis functions cutoff_matrix: [[5.0]] # In angstroms. NxN array for N_species in a system. energy_noise: 0.096 # Energy noise hyperparameter, will be trained later. Typically set to 1 meV * N_atoms. forces_noise: 0.05 # Force noise hyperparameter, will be trained later. System dependent, typically between 0.05 meV/A and 0.2 meV/A. stress_noise: 0.001 # Stress noise hyperparameter, will be trained later. Typically set to 0.001 meV/A^3. energy_training: True force_training: True stress_training: True species: - [14, 8] # Atomic number of your species (here, 13 = Al). single_atom_energies: - 0 # Single atom energies to bias the energy prediction of the model. Can help in systems with poor initial energy estimations. Length must equal the number of species. cutoff: 5.0 # Cutoff for the (ACE) descriptor. Typically informed by the radial distribution function of the system. Should equal the maximum value in the cutoff_matrix. variance_type: local # Calculate atomic uncertainties. max_iterations: 20 # Maximum steps taken during each hyperparameter optimization call. use_mapping: True # Print mapped model (ready for use in LAMMPS) during trajectory. Model is re-mapped and replaced if new DFT calls are made throughout the trajectory.
In the tutorial, we use ASE Lennard-Jones potential as ground truth
instead of DFT to save time
dft_calc: name: Vasp kwargs: command: "mpirun vasp_std" # pseudo-potential xc: pbe # k points kpts: [5, 5, 4] # INCAR istart: 0 npar: 8 ediff: 1.0e-6 encut: 800 ismear: -5 sigma: 0.2 lreal: Auto prec: Accurate algo: Fast lscalapack: False params: {}
Set up On-the-fly training and MD
otf: # On-the-fly training and MD
mode: fresh # Start from an empty SGP
md_engine: VelocityVerlet # Define MD engine, here we use the Velocity Verlet engine from ASE. LAMMPS examples can be found in the flare/examples directory in the repo
md_kwargs: {} # Define MD kwargs
initial_velocity: 1000 # Initialize the velocities
dt: 0.001 # Set the time step in picoseconds (1 fs here)
number_of_steps: 10 # Total number of MD steps to be taken
output_name: Al_otf # Name of output
init_atoms: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] # Initial atoms to be added to the sparse set
std_tolerance_factor: -0.01 # The uncertainty threshold above which the DFT will be called
max_atoms_added: -1 # Allow for all atoms in a given frame to be added to the sparse set if uncertainties permit
train_hyps: [5,inf] # Define range in which hyperparameters will be optimized. Here, hyps are optimized at every DFT call after the 5th call.
write_model: 4 # Verbosity of model output.
update_style: threshold # Sparse set update style. Atoms above a defined "threshold" will be added using this method
update_threshold: 0.001 # Threshold for adding atoms if "update_style = threshold". Threshold represents relative uncertainty to mean atomic uncertainty, where atoms above are added to sparse set
force_only: False # Train on forces, stresses, and energies.
- Error message
File "/home/USER/.local/lib/python3.10/site-packages/flare/scripts/otf_train.py", line 285, in
- otf_train.yaml (2) same as above, except line: species: - 14 - 8
- Error message
File "/home/USER/.local/lib/python3.10/site-packages/flare/scripts/otf_train.py", line 233, in get_sgp_calc
assert np.allclose(np.array(d["cutoff_matrix"]).shape, (n_species, n_species)),
AssertionError: cutoff_matrix needs to be of shape (n_species, n_species)
- otf_train.yaml (3) same as above, except line: species: - (14, 8)
- Error message
File "/home/USER/.local/lib/python3.10/site-packages/flare/bffs/sgp/sparse_gp.py", line 335, in update_db coded_species.append(self.species_map[spec])
- otf_train.yaml (4) same as above, except line: species: - 14, 8
- Error message File "/home/USER/.local/lib/python3.10/site-packages/flare/bffs/sgp/sparse_gp.py", line 335, in update_db coded_species.append(self.species_map[spec])
@johnemec If you have two species, then use
species:
- 14
- 8
And in such a case, the cutoff_matrix: [[5.0]] is wrong. Instead the cutoff_matrix should be a 2x2 matrix specifying cutoffs between Si-Si, Si-O, O-Si, O-O. If you want to use the same cutoff, you can also just remove the argument cutoff_matrix
@johnemec, Please also include the following, in addition to Yu's suggestions:
single_atom_energies: # total number of entries should match the number of elements considered
- 0
- 0