openmmtools
openmmtools copied to clipboard
LangevinSplittingDynamicsMove sometimes nans
@dominicrufa and I were trying to run repex on the endstates of a capped amino acid transformation (ALA->THR) (using the new RepartitionedHybridTopologyFactory in perses) with softened electrostatics, sterics, and torsion. We observed that when we use the LangevinSplittingDynamicsMove
, the simulation runs fine at the ALA endstate, but nans immediately for the THR endstate. When we use GHMCMove
instead, the simulation runs slower, but doesn't nan.
In the example code below, I show that the nans occur when I apply the LangevinSplittingDynamicsMove
(i.e. without even intializing the repex sampler). Note that when I ran 1000 steps of MD at each of the 11 thermodynamic states using the LangevinIntegrator
, no nans occurred.
Here is the code:
from perses.tests.test_topology_proposal import generate_atp, generate_dipeptide_top_pos_sys
from openmmtools.alchemy import AbsoluteAlchemicalFactory, AlchemicalRegion, AlchemicalState
from openmmtools import states, cache
from simtk import openmm, unit
import copy
from perses.dispersed import feptasks
from openmmtools.mcmc import LangevinSplittingDynamicsMove, GHMCMove
from openmmtools.multistate import ReplicaExchangeSampler, MultiStateReporter
# Generate htf at lambda = 1
atp, sys_gen = generate_atp()
htf_1 = generate_dipeptide_top_pos_sys(atp.topology,
new_res = 'THR',
system = atp.system,
positions = atp.positions,
system_generator = sys_gen,
conduct_htf_prop=True,
repartitioned=True, endstate=1, validate_endstate_energy=False)
# Alchemify the system
atoms_to_alchemify = list(htf_1._atom_classes['unique_new_atoms']) + list(htf_1._atom_classes['unique_old_atoms'])
alch_factory = AbsoluteAlchemicalFactory(consistent_exceptions=False)
alchemical_region = AlchemicalRegion(alchemical_atoms=list(atoms_to_alchemify), alchemical_torsions=True)
alchemical_system = alch_factory.create_alchemical_system(htf_1.hybrid_system, alchemical_region)
# Initialize compound thermodynamic states at different temperatures and alchemical states.
protocol = {'temperature': [300]*unit.kelvin*11,
'lambda_electrostatics': [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0],
'lambda_sterics': [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0],
'lambda_torsions': [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0]}
alchemical_state = AlchemicalState.from_system(alchemical_system)
compound_states = states.create_thermodynamic_state_protocol(alchemical_system,
protocol=protocol,
composable_states=[alchemical_state])
# Minimize w.r.t each thermodynamic state
context_cache = cache.ContextCache()
sampler_state_list = []
sampler_state = states.SamplerState(htf_1.hybrid_positions, box_vectors=htf_1.hybrid_system.getDefaultPeriodicBoxVectors())
for i, compound_thermodynamic_state in enumerate(compound_states):
print(i)
# now generating a sampler_state for each thermodyanmic state, with relaxed positions
context, context_integrator = context_cache.get_context(compound_thermodynamic_state)
feptasks.minimize(compound_thermodynamic_state, sampler_state)
sampler_state_list.append(copy.deepcopy(sampler_state))
# Set up the LangevinnSplittingDynamicsMove
n_steps = 500 # 1 ps
n_iterations = 1000 # 1 ns
langevin_move = LangevinSplittingDynamicsMove(timestep=2.0*unit.femtosecond, n_steps=n_steps)
# Apply move
for i, (sampler_state, thermostate) in enumerate(zip(sampler_state_list, compound_states)):
print(i)
langevin_move.apply(thermostate, sampler_state)
Here is the output from the for loop in the last code snippet:
1
2
3
4
5
WARNING:openmmtools.mcmc:Potential energy is NaN after 0 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
WARNING:openmmtools.mcmc:Potential energy is NaN after 1 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
WARNING:openmmtools.mcmc:Potential energy is NaN after 2 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
ERROR:openmmtools.mcmc:Potential energy is NaN after 3 attempts of integration with move LangevinSplittingDynamicsMove Trying to reinitialize Context as a last-resort restart attempt...
ERROR:openmmtools.mcmc:Potential energy is NaN after 4 attempts of integration with move LangevinSplittingDynamicsMove
---------------------------------------------------------------------------
IntegratorMoveError Traceback (most recent call last)
<ipython-input-11-4b2538f4dd1b> in <module>
2 for i, (sampler_state, thermostate) in enumerate(zip(sampler_state_list, compound_states)):
3 print(i)
----> 4 langevin_move.apply(thermostate, sampler_state)
5
~/miniconda3/envs/perses-sims/lib/python3.7/site-packages/openmmtools/mcmc.py in apply(self, thermodynamic_state, sampler_state)
1112 """
1113 # Explicitly implemented just to have more specific docstring.
-> 1114 super(LangevinDynamicsMove, self).apply(thermodynamic_state, sampler_state)
1115
1116 def __getstate__(self):
~/miniconda3/envs/perses-sims/lib/python3.7/site-packages/openmmtools/mcmc.py in apply(self, thermodynamic_state, sampler_state)
705 sampler_state.apply_to_context(context)
706 logger.error(err_msg)
--> 707 raise IntegratorMoveError(err_msg, self, context)
708 else:
709 logger.warning(err_msg + ' Attempting a restart...')
IntegratorMoveError: Potential energy is NaN after 4 attempts of integration with move LangevinSplittingDynamicsMove
I've been seeing the exact same error, using a modified version of the ReplicaExchange
class from this tutorial. I tried increasing n_restart_attempts
to 100 but it didn't help at all!
Here's my (pretty bad) code, almost entirely copied from the tutorial:
import math, yaml, sys, cctk, os
from datetime import datetime
import numpy as np
from random import random, randint
import mdtraj
from simtk import openmm, unit
from simtk.openmm.app import AmberPrmtopFile, AmberInpcrdFile, PME
from openmmtools import testsystems, integrators, forces, alchemy, states, cache, mcmc
import cctk.helper_functions as helper
# usage: python run_rest2.py config.yaml
class ReplicaExchange:
def __init__(self, thermodynamic_states, sampler_states, mcmc_move, save_interval=1000, swap_interval=1000, total_steps=10000, logfile="rest2.log", traj_prefix="rest2"):
self._thermodynamic_states = thermodynamic_states
self._replicas_sampler_states = sampler_states
self._mcmc_move = mcmc_move
self.save_interval = save_interval
self.swap_interval = swap_interval
self.total_steps = total_steps
self.current_step = 0
self.logfile = logfile
self.traj_prefix = traj_prefix
# initial minimization
self.log("beginning energy minimization")
for thermo_state, sampler_state in zip(self._thermodynamic_states, self._replicas_sampler_states):
context, integrator = cache.global_context_cache.get_context(thermo_state)
sampler_state.apply_to_context(context)
openmm.LocalEnergyMinimizer.minimize(context)
sampler_state.update_from_context(context)
def log(self, text):
with open(self.logfile, "a") as f:
time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
f.write(f"{time}\t{text}\n")
def run(self):
self.log("beginning replica exchange simulation")
for iteration in range(int(self.total_steps/self.save_interval)):
self._propagate_replicas()
self.current_step += self.save_interval
with open(self.logfile, "a") as f:
time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
f.write(f"{time}\t{self.current_step} steps completed\n")
self.save_positions()
if self.current_step % self.swap_interval == 0:
self._mix_replicas()
def _propagate_replicas(self):
for thermo_state, sampler_state in zip(self._thermodynamic_states, self._replicas_sampler_states):
self._mcmc_move.apply(thermo_state, sampler_state)
def _mix_replicas(self, n_attempts=5):
for i in range(len(self._thermodynamic_states) - 1):
j = i+1
sampler_state_i, sampler_state_j = (self._replicas_sampler_states[k] for k in [i, j])
thermo_state_i, thermo_state_j = (self._thermodynamic_states[k] for k in [i, j])
energy_ii = self._compute_reduced_potential(sampler_state_i, thermo_state_i)
energy_jj = self._compute_reduced_potential(sampler_state_j, thermo_state_j)
energy_ij = self._compute_reduced_potential(sampler_state_i, thermo_state_j)
energy_ji = self._compute_reduced_potential(sampler_state_j, thermo_state_i)
log_p_accept = - (energy_ij + energy_ji) + energy_ii + energy_jj
if log_p_accept >= 0.0 or random() < math.exp(log_p_accept):
self._thermodynamic_states[i] = thermo_state_j
self._thermodynamic_states[j] = thermo_state_i
with open(self.logfile, "a") as f:
f.write(f"\tswapping {i} and {j} ({log_p_accept:.3f})\n")
def _compute_reduced_potential(self, sampler_state, thermo_state):
context, integrator = cache.global_context_cache.get_context(thermo_state)
sampler_state.apply_to_context(context)
return thermo_state.reduced_potential(context)
def save_positions(self):
... # omitted for space
config_file = sys.argv[1]
with open(config_file, "r") as f:
config = yaml.safe_load(f)
assert config["swap_interval"] % config["save_interval"] == 0, "swap_interval must be a multiple of save_interval"
num_replicas = config["num_replicas"]
prmtop = AmberPrmtopFile(config["prmtop"])
inpcrd = AmberInpcrdFile(config["inpcrd"])
system = prmtop.createSystem(nonbondedMethod=PME, nonbondedCutoff=config["nonbonded_cutoff"] * unit.angstrom, constraints=None)
mdtop = mdtraj.Topology.from_openmm(prmtop.topology)
high_atoms = mdtop.select(f"resname {config['high_resname']}")
min_L = config["min_lambda"]
protocol = {
'temperature' : [config["temperature"]] * num_replicas * unit.kelvin,
'lambda_sterics' : list(np.geomspace(1.0, min_L, num_replicas)),
'lambda_electrostatics' : list(np.geomspace(1.0, min_L, num_replicas)),
'lambda_angles' : list(np.geomspace(1.0, min_L, num_replicas)),
'lambda_torsions' : list(np.geomspace(1.0, min_L, num_replicas)),
}
alchemical_region = alchemy.AlchemicalRegion(alchemical_atoms=high_atoms, alchemical_angles=True, alchemical_torsions=True)
factory = alchemy.AbsoluteAlchemicalFactory()
alchemical_system = factory.create_alchemical_system(system, alchemical_region)
alchemical_state = alchemy.AlchemicalState.from_system(alchemical_system)
compound_states = states.create_thermodynamic_state_protocol(alchemical_system, protocol=protocol, composable_states=[alchemical_state])
sampler_states = [states.SamplerState(positions=inpcrd.getPositions(), box_vectors=inpcrd.getBoxVectors()) for _ in compound_states]
langevin_move = mcmc.LangevinSplittingDynamicsMove(timestep=0.5*unit.femtosecond, n_steps=config["swap_interval"])
langevin_move.n_restart_attempts = 100
parallel_tempering = ReplicaExchange(
compound_states,
sampler_states,
langevin_move,
total_steps=config["total_steps"],
swap_interval=config["swap_interval"],
save_interval=config["save_interval"],
logfile=config["logfile"],
traj_prefix=config["traj_prefix"],
)
parallel_tempering.run()
Using the following input file, I only make it through one LangevinSplittingDynamicsMove
per replica before dying. This is surprising because the LocalEnergyMinimizer
runs without dying, so it doesn't seem like the system is just bogus/contains massive conflicts. Visually, everything looks fine (inputs generated using antechamber
/parmchk
/PACKMOL
/tleap
).
config.yaml:
prmtop: solvated_13precomplex.prmtop
inpcrd: solvated_13precomplex.inpcrd
logfile: 13precomplex.log
traj_prefix: replica
nonbonded_cutoff: 8 # Å
temperature: 298
high_resname: MOL
total_steps: 5000000
swap_interval: 2000
save_interval: 500
num_replicas: 24
min_lambda: 0.66
Output:
Potential energy is NaN after 0 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
Potential energy is NaN after 1 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
Potential energy is NaN after 2 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
Potential energy is NaN after 3 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
...
Potential energy is NaN after 98 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...
Potential energy is NaN after 99 attempts of integration with move LangevinSplittingDynamicsMove Trying to reinitialize Context as a last-resort restart attempt...
Potential energy is NaN after 100 attempts of integration with move LangevinSplittingDynamicsMove
Traceback (most recent call last):
File "run_rest2.py", line 142, in <module>
parallel_tempering.run()
File "run_rest2.py", line 44, in run
self._propagate_replicas()
File "run_rest2.py", line 56, in _propagate_replicas
self._mcmc_move.apply(thermo_state, sampler_state)
File "/n/home03/cwagen/.conda/envs/openmm/lib/python3.7/site-packages/openmmtools/mcmc.py", line 1114, in apply
super(LangevinDynamicsMove, self).apply(thermodynamic_state, sampler_state)
File "/n/home03/cwagen/.conda/envs/openmm/lib/python3.7/site-packages/openmmtools/mcmc.py", line 707, in apply
raise IntegratorMoveError(err_msg, self, context)
openmmtools.mcmc.IntegratorMoveError: Potential energy is NaN after 100 attempts of integration with move LangevinSplittingDynamicsMove
Any advice would be greatly appreciated!
Are you both still seeing this with OpenMM 7.7.0 and OpenMMtools 0.21.0? I am seeing something similar with an internal test system and want to debug this more.
Just so that we are all on the same page, would it be possible to generate a minimal, self-contained example (perhaps using openmmtools.testsystems) that fails, that we can all run (and agree on)? I cannot run zhang-ivy's example because I do not have a OpenEye license and I cannot run the second example since I do not have solvated_13precomplex.{prmtop, inpcrd}.
I posted an issue (#642) similar to this.
I am using openmm 7.7.0 and openmmtools 0.21.5. I also get the same issue with openmm 7.6.0.
I have tried to use ReplicaExchangeSampler with 3 different systems, thinking it might be stemming from a poorly equilibrated system. I'm fairly certain now that the issue is not with the input structure's state.
my lambdas:
[1.0, 0.9839948356327152, 0.9682458365518544, 0.9527489027899027, 0.9375]
If I ReplicaExchangeSamper.minimize() and then use a 2 femtosecond timestep with the above minor spread of lambdas, I get the NaN error apparently on the attempt to propagate the 5th (final) system
SimulationNaNError: Propagating replica 4 at state 4 resulted in a NaN! The state of the system and integrator before the error were saved in replex_output/nan-error-logs
I believe this has occurred on the second replica too.
If I reduce the timestep to 0.01 femtoseconds, ReplicaExchangeSamper.equilibrate(300) will complete but one or more of the system(s) are in the process of blowing up. Results seem to be different depending on LangevinDynamicsMove or LangevinSplittingDynamicsMove with only the final system looking exploded for the former and all of them for the latter.
positions, topology = PDBFile(eq_structure).positions, PDBFile(eq_structure).topology
with open(f'output/1zip_system.xml') as infile:
system = XmlSerializer.deserialize(infile.read())
# if applying lambda to torsions then AlchemicalRegion(alchemical_torsions=True)
alchemical_region = alchemy.AlchemicalRegion(alchemical_atoms=guest_atoms,alchemical_torsions=True)
factory = alchemy.AbsoluteAlchemicalFactory()
alchemical_system = factory.create_alchemical_system(system, alchemical_region)
alchemical_state = alchemy.AlchemicalState.from_system(alchemical_system)
# Initialize compound thermodynamic states at different temperatures and alchemical states.
# for the Hamiltonian approach - same temperature but different lambdas.
protocol = {'temperature': [300 for i in range(n_replicas)] * kelvin,
'lambda_electrostatics': lambdas,
'lambda_sterics': lambdas,
'lambda_torsions': lambdas}
thermodynamic_states = states.create_thermodynamic_state_protocol(alchemical_system,
protocol=protocol,
composable_states=[alchemical_state])
box_vectors = system.getDefaultPeriodicBoxVectors()
sampler_states = [states.SamplerState(positions=positions, box_vectors=box_vectors)
for i,_ in enumerate(thermodynamic_states)]
friction = 1.0 / openmm.unit.picosecond
move = mcmc.LangevinSplittingDynamicsMove(timestep=0.01*femtosecond,
n_steps=1000,
collision_rate=friction
)
# Different results with different move
'''
move = mcmc.LangevinDynamicsMove(timestep=.01*femtosecond,
n_steps=1000,
collision_rate=friction
)
'''
exchange_attempts = 5
simulation = ReplicaExchangeSampler(
mcmc_moves=move,
number_of_iterations=exchange_attempts,
replica_mixing_scheme='swap-neighbors',
)
reporter = MultiStateReporter('replex_output/1zip_hremd_test.nc', checkpoint_interval=1)
simulation.create(thermodynamic_states, sampler_states, reporter)
simulation.minimize()
simulation.equilibrate(300)
I've attached a notebook with input files.
Thank you for any help.
Well, it appears that it was an equilibration issue. I reduced the ReplicaExchangeSampler timestep to 0.01 femtoseconds and also increased the collision rate to 100/ps for a short equilibration, then used those sampler states to start a new HREMD simulation with a 2 femtosecond timestep and 20/ps collision rate and things seem to be working.
The 1/ps collision rate in the above was too low, but even when I increased it, the 2 femtosecond timestep was too long even after the additional minimization. Doing 100 iterations of equilibration at the high collision rate and short timestep apparently solved my problem.
@mjw99, maybe the system will still be useful.
Thank you.
I have to backtrack on my previous comment. I still get the nan error.
I am applying lambdas to electrostatics, torsions, and sterics.
If I run a LangevinSplittingDynamicsMove where each replica has lambda = 1, the simulation doesn't crash.
If I run it with lambdas below ~0.9, it crashes with nan potential energy.
Importantly, if I do not apply the lambdas to sterics, the simulation does not crash.
Like @zhang-ivy, my simulation will run with lambda applied to the three terms if I use GHMCMove.
My goal is to run Rest2 with all protein atoms enhanced with the same scheme used by the Plumed implementation where electrostatics have sqrt(lambda) and sterics and torsions have the full lambda value applied.