flare MGP from AIMD_GP freeze after a fixed number of steps in LAMMPS

Describe the bug Greetings,

I have used the gp_from_aimd to obtain a "gp_from_aimd_model.pickle" which I then used the MappedGaussianProcess to generate the mgp file for LAMMPS.

I used the 31 August 2021 version of LAMMPS which I compiled three different systems (Ryzen 3750X Ubuntu 18.04 with virtual box, intel Xeon Gold 6148 on RHEL and intel KNL PHI 7210 on RHEL) All three system experienced the same issue: LAMMPS MD run will freeze after a fixed number of steps. From "top", LAMMPS is still running with 100% CPU usage.

screenshot.pdf

LAMMPS was compiled with intelcc 2020.2 on RHEL with intelMPI and gcc 7 with openmpi 4.1.1 on Ubuntu 18.04.

The MGP file is generated with the following PYTHON script:

from flare.mgp import MappedGaussianProcess
from flare.gp import GaussianProcess

gp_path = r"/home/wontleave/calc/flare_training/tol_h2o_RP_D4_373K_10ps/gp_from_aimd_model.pickle"

gp_model = GaussianProcess.from_file(gp_path)

grid_params = {'twobody':   {'grid_num': [80], 'lower_bound': [0.7], "upper_bound": [7.0] },
               'threebody': {'grid_num': [20, 20, 20], 'lower_bound': [0.7, 0.7, 0.7], "upper_bound": [4.0, 4.0, 4.0]}
               }

data = gp_model.training_statistics
lammps_location = 'TolWater_RP'

mgp_model = MappedGaussianProcess(grid_params, data['species'],
    var_map="pca", lmp_file_name='/home/wontleave/calc/flare_training/tol_h2o_RP_D4_373K_10ps/TolWater_RPD4otf', n_cpus=2)

mgp_model.build_map(gp_model)

I have attached the files used to run LAMMP toluenes_mgp.zip

Please kindly advise. Thank you.

Regards, Choon Wee

Sep 17 '21 12:09 wontleave

Hi @wontleave ,

I checked your files and tried running it with lammps, if you check your std out/err file, you will find the error message from lammps:

      71   -5749.0502    -6210.207    461.15686    335.96993 
      72   -5749.0514   -6209.5928    460.54135    335.52151 
      73   -5749.0529   -6209.0476    459.99475    335.12329 
ERROR: MGP lower bound: The interatomic distance is smaller than the 3b lower bound. (../pair_mgp.cpp:382)
Last command: run 8000

This means that at step 73, there are atoms getting closer than 0.7A (which is the "lower_bound" set for MGP). For the solution, please check the discussions here: #284

Another thing I notice is that you use 80 grid points for 2-body, and 20 grid-points for 3-body, which seems to be a bit coarse, and might affect the accuracy of MGP interpolation. So it's also likely that your atoms get too close to each other because the potential is not well interpolated by MGP. You can try increasing the number of grids, for example, 2-body up to 128, 3-body 32 or 64, and see if you still get the issue.

Sep 17 '21 14:09 YuuuXie

Greetings @YuuuXie ,

Thank you for your insights.

I have tried 128 for 2-body and 64 for 3-body. It appears that it does not make a difference if the lower bound is 0.5 or 0.7. The MD is only able to proceed when the lower bound is set at 0.0,

I noticed that there is no total energy conservation for NVT when the tdamp for the thermostat is, say, 50fs (Nose or Berdensen). A larger tdamp reduces the severity of lack of energy conservation. Is this a sign that the GP wasn't adequately trained?

NVE MD conserves the total energy well when the time step is 0.1fs.

Regards, Choon Wee

Sep 20 '21 05:09 wontleave

Hi @wontleave

I notice that your .mgp potential file has the coefficients for C, H, O, while your lammps simulation is just for small molecules with C and H. So quick question, does it mean that you are using a different molecule for simulation from what you used for training?

Sep 20 '21 16:09 YuuuXie

Hi @YuuuXie

It is the same molecule. The training AIMD consists of water and toluene while the simulation has only toluenes.

Sep 21 '21 05:09 wontleave

Hi @wontleave

It will make more sense to have the same system for training and testing data. So I would suggest that you either simulate with water+toluene to see if the energy conservation is good, or you add the system with only toluenes to the training data.

Sep 21 '21 14:09 YuuuXie

Hi @YuuuXie

Thank you for your suggestion. I was hoping to do MD on a large biphasic system that will have an interface where toluenes and water meet. There will also be regions where toluenes and waters are in their bulk environment, therefore the ML potential should be able to model toluene and water individually and together. The current GP is based on a 10-angstrom cube, it might not be representative. FYI: MD with water+toluene has the same problem.

I wonder what is the correct procedure to add training data to an existing GP. Is it possible to add training data with fewer atom types than the original GP? Please kindly point me to the documentation if it already exists.

Sep 21 '21 15:09 wontleave

Hi @wontleave

I wonder what is the correct procedure to add training data to an existing GP.

You can either run an on-the-fly training, for which you can refer to Theory and OTF module tutorial (support NVE) or ASE_OTF module tutorial (support NVE, NVT, NPT and ASE library). Or if you already have a lot of data from AIMD, you can check GP from AIMD tutorial

Is it possible to add training data with fewer atom types than the original GP?

Yes, if you are training the model with our on-the-fly module, you can resume the training with a non-empty initial GP that has already included some training data, and start the training on a different system. If you use module OTF or ASE_OTF or TrajectoryTrainer for training, just change the input arg gp_model to the non-empty GP

# For OTF
otf_model = otf.OTF(..., gp=gp_model, ...)

# For ASE_OTF
flare_calculator = FLARE_Calculator(gp_model, ...)

# For TrajectoryTrainer
TT = TrajectoryTrainer(gp = gp_model,...)

Finally, if you want to validate if the GP is trained adequately, you can generate some validation data and compute them with DFT to get the true energy and forces. Then you can compare with the GP prediction to get the error of GP, and see if the error is too large

Sep 23 '21 18:09 YuuuXie

Close stale issue

Dec 04 '22 16:12 YuuuXie