nequip icon indicating copy to clipboard operation
nequip copied to clipboard

Inconsistent runtime error when using a nequip model to preform Langevin Dynamics in ASE🐛 [BUG]

Open NicholasHattrup opened this issue 1 year ago • 5 comments

Describe the bug When running multiple dynamic runs using the nequip calculator for ASE I sometimes have trajectories crashing and giving the error below:

Traceback (most recent call last):
  File "/home/nhattrup/Fluxional_MD/scripts/md.py", line 71, in <module>
    nvt_dyn.run(steps=args.num_steps)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/md/md.py", line 137, in run
    return Dynamics.run(self)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/optimize/optimize.py", line 156, in run
    for converged in Dynamics.irun(self):
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/optimize/optimize.py", line 135, in irun
    self.step()
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/md/langevin.py", line 171, in step
    forces = atoms.get_forces(md=True)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/atoms.py", line 788, in get_forces
    forces = self._calc.get_forces(self)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces
    return self.get_property('forces', atoms)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/nequip/ase/nequip_calculator.py", line 108, in calculate
    data = AtomicData.from_ase(atoms=atoms, r_max=self.r_max)
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/nequip/data/AtomicData.py", line 427, in from_ase
    return cls.from_points(
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/nequip/data/AtomicData.py", line 308, in from_points
    edge_index, edge_cell_shift, cell = neighbor_list_and_relative_vec(
  File "/home/nhattrup/.conda/envs/nequip/lib/python3.9/site-packages/nequip/data/AtomicData.py", line 744, in neighbor_list_and_relative_vec
    raise ValueError(
ValueError: After eliminating self edges, no edges remain in this system.

To Reproduce Most recent nequip version with ASE and if needed I am happy to supply the deployed nequip model I am using. Besides that below is the code I am using to generate the Langevin class to run dynamics with:

for i in range(args.samples):
        nvt_dyn = Langevin(
        atoms=atoms,
        temperature_K=args.temperature,
        timestep=args.dt * units.fs,
        friction=0.02)
        traj_file = args.dir + '/' + 'Trajectory_' + str(i) + '.traj'
        print(i, traj_file)
        MaxwellBoltzmannDistribution(atoms=atoms, temp=args.temperature * units.kB)
        ZeroRotation(atoms) # Set center of mass momentum to zero
        Stationary(atoms) # Set rotation about center of mass zero
        traj = ASETrajectory(traj_file, 'w', atoms)
        traj.write(atoms)
        nvt_dyn.attach(traj.write, interval=args.interval)
        nvt_dyn.run(steps=args.num_steps)
        traj.close()
        # reset atom positions to initial sampling geometry
        atoms.set_positions(init_xyz.copy()) 

Expected behavior Should just preform Dynamics with no issues and print the associated trajectory number and path where data is being written, i.e.:

1 ../nequip/.../ASE/Trajectory_1.traj
2 ../nequip/.../ASE/Trajectory_2.traj

Environment (please complete the following information):

  • OS: Ubuntu
  • python version 3.9.12
  • python environment (commands are given for python interpreter):
    • nequip version 0.5.4
    • e3nn version 0.4.4
    • pytorch version 1.12.0+cu116
  • (if relevant) GPU support with CUDA
    • cuda Version according to nvcc Build cuda_11.6.r11.6/compiler.30978841_0
    • cuda version according to PyTorch 11.6

Additional Context For the Trajectories that do not fail, they look perfectly reasonable

NicholasHattrup avatar Jul 28 '22 03:07 NicholasHattrup

Hi @NicholasHattrup ,

ValueError: After eliminating self edges, no edges remain in this system.

This error means that for the given system state and cutoff, every single atom has no neighbors. The vast majority of the time this is a sign that something is wrong (such as trying to set the cutoff in the wrong distance units), and so we made it an error. In MD, it could also be a sign that your simulation is exploding.

What kind of system are you simulating, and have you checked the trajectories that throw this error? Do they look physically plausible / is having a system state with such low density reasonable for your application? If it is, I can make this a configurable option to suppress the error. Note that in the case where all atoms have no neighbors, the predicted forces on all atoms will be zero, since the total energy will consist only of constant per-atom and global shifts (depending on your configuration).

Linux-cpp-lisp avatar Jul 28 '22 17:07 Linux-cpp-lisp

I ran into the same ValueError and noticed that it disappeared when I increased r_max. It would be great if the message was more helpful (f"Every single atom has no neighbors within the cutoff r_min: {r_min}")

mhellstr avatar Nov 04 '22 08:11 mhellstr

Hi @mhellstr ,

Good point--- changed on develop: https://github.com/mir-group/nequip/commit/da3e4bdf7ce9a25a1427f0efd15504dee71964df

Linux-cpp-lisp avatar Nov 06 '22 16:11 Linux-cpp-lisp

Thanks, although now that I think more about it I don't think this should really be an error. It would be great to use nequip to train/predict the H2 or O2 energy as a function of distance, for example (dissociation curve). Then it would be awkward to only be able to predict points inside r_max.

mhellstr avatar Nov 09 '22 08:11 mhellstr

Quoting the above,

The vast majority of the time this is a sign that something is wrong (such as trying to set the cutoff in the wrong distance units), and so we made it an error. In MD, it could also be a sign that your simulation is exploding.

But like I said if people need it I'm happy to make an option to suppress this error.

Linux-cpp-lisp avatar Nov 09 '22 14:11 Linux-cpp-lisp