yank FIRE integrator does not fully minimize this system; ASE does

FIRE integrator does not fully minimize this system; ASE does

Open jaimergp opened this issue 5 years ago • 5 comments

Hi,

I am helping a labmate to minimize a rather tricky dimer system that has been built out of the separate monomers by homology modelling (against PDB 3C7N),and some manual refinement. The result is a structure with a handful of clashes that the builtin OpenMM minimizer fails to optimize: after three days, some regions of structure begins to collide and, obviously, the MD does not run.

So I was given the starting PRMTOP/INPCRD files and asked to try with another algorithm. (I cannot upload the files publicly, but I can email them to you if needed.)

I first tried to use the minimizers included in the ase library by coupling it with a custom Calculator designed to obtain the energy and forces with OpenMM (code below):

from simtk import openmm

class OpenMMCalculator(object):

    def __init__(self, system, topology, integrator=None):
        self.system = system
        self.topology = topology
        self._state = None
        self._simulation = None
        self.integrator = integrator or openmm.VerletIntegrator(0.001)

    @property
    def simulation(self):
        if self._simulation is None:
            simulation = openmm.app.Simulation(self.topology, self.system, self.integrator)
            self._simulation = simulation
        return self._simulation

    def state(self, positions):
        self.simulation.context.setPositions(positions * pmd.unit.angstrom)
        self._state = self.simulation.context.getState(getEnergy=True, getForces=True)
        return self._state

    def get_potential_energy(self, atoms=None, force_consistent=False):
        """Return total energy.

        Both the energy extrapolated to zero Kelvin and the energy
        consistent with the forces (the free energy) can be
        returned."""
        return np.array(self.state(atoms.positions).getPotentialEnergy()._value)

    def get_forces(self, atoms):
        """Return the forces."""
        return np.array((self.state(atoms.positions).getForces() / 10)._value)  # per angstrom!

    def get_stress(self, atoms):
        """Return the stress."""
        return 0.0

    def calculation_required(self, atoms, quantities):
        return True

With this calculator, I then called ase.optimize.FIRE:

import os
import ase
import ase.optimize
import parmed as pmd

def minimize(prmtop, inpcrd, optimizer=ase.optimize.FIRE, fmax=0.05, steps=100000000, 
             ase_options=None, openmm_options=None):
    """
    Minimize an biological structure built with AmberTools. Topology and parameters must be provided in
    PRMTOP, coordinates preferrably in INPCRD, but others might work as well.
    
    The optimization trajectory will be stored in a ``*.traj`` file. A restart file with the hessian
    and more stuff will be saved to ``*.pckl``. If PCKL file already exists, it will be used as the
    starting point of the hessian, so you might want to delete it across different attempts if the
    filename is the same.
    
    Also, a ``*.start.pdb`` and ``*.min.pdb`` will be generated with the original and optimized
    states, respectively, so you can compare them. An INPCRD will be generated as well.
    
    Parameters
    ----------
    prmtop : str
        Path to PRMTOP file (Amber topology + parameters)
    inpcrd : str
        Path to coordinates file (normally, INPCRD), in Angstrom.
    optimizer : callable
        Which ASE optimizer to use. Defaults to FIRE.
    fmax : float
        Max average force for convergence
    steps : int
        Max number of minimization steps before stopping
    ase_options : dict
        Keyword arguments that will be passed onto the optimizer initialization.
    openmm_options : dict
        Keyword arguments that will be passed on OpenMM's system creation (``prmtop.createSystem``).
        See https://parmed.github.io/ParmEd/html/structobj/parmed.structure.Structure.html#parmed.structure.Structure.createSystem
        
    Returns
    -------
    opt
        The optimizer instance (so you can call opt.run() again, if needed)
    ase_atoms
        The ASE structure, with updated positions, for interactive inspection
    """
    if ase_options is None:
        ase_options = {}
    if openmm_options is None:
        openmm_options = {}
    basename, ext = os.path.splitext(prmtop)
    struct = pmd.load_file(prmtop, xyz=inpcrd)
    struct.save(basename + '.start.pdb', overwrite=True)
    system = struct.createSystem(**openmm_options)
    ase_atoms = ase.Atoms(numbers=[a.element for a in struct.atoms], 
                          positions=struct.coordinates,
                          calculator=OpenMMCalculator(system, struct.topology))
    opt = optimizer(atoms=ase_atoms,  trajectory=basename + '.traj',restart=basename + '.pckl', **ase_options)
    opt.run(fmax=fmax, steps=steps)
    struct.coordinates = ase_atoms.positions * pmd.unit.angstrom
    struct.save(basename + '.min.pdb', overwrite=True)
    struct.save(basename + '.min.inpcrd', overwrite=True)
    return opt, ase_atoms

And called it with default arguments:

opt, atoms = minimize("my.prmtop", "my.inpcrd")

The code works, slowly but surely, and in a matter of hours found an acceptable minimum with almost no clashes (original one had 420+ bad contacts):

     Step     Time          Energy         fmax
*Force-consistent energies used in optimization.
FIRE:    0 16:54:59 1365569479182351.500000* 106934872482181840.0000
FIRE:    1 16:55:10 590213847001.369751* 17296490224396.2578
FIRE:    2 16:55:19 267267645210.039886* 5091715449409.0625
FIRE:    3 16:55:28 111928043775.791519* 1178028006299.3647
FIRE:    4 16:55:38 56106389020.855469* 316187642544.1710
FIRE:    5 16:55:47 30444412153.592461* 144719585812.3805
FIRE:    6 16:55:55 17118991257.313477* 60182437230.0057
FIRE:    7 16:56:04 10077047616.547144* 26912213062.6401
FIRE:    8 16:56:13 6223857399.995024* 12717554493.1037
FIRE:    9 16:56:22 4066335433.148547* 5194503618.2014
FIRE:   10 16:56:31 2798536494.639191* 3695926848.3093
FIRE:   11 16:56:40 2001396047.852563* 2604284094.3127
FIRE:   12 16:56:49 1480023127.285435* 1572184698.6121
FIRE:   13 16:56:59 1134604545.295926* 1273793133.0214
FIRE:   14 16:57:09 916346526.171279* 1393729622.3804
FIRE:   15 16:57:18 866011451.974753* 4000822992.0231
FIRE:   16 16:57:27 525944578.346811* 359840483.0135
FIRE:   17 16:57:36 384395422.179283* 182973989.3622
FIRE:   18 16:57:44 280328366.963216* 128135244.0095
FIRE:   19 16:57:53 207690106.871335* 84754781.1432
FIRE:   20 16:58:02 157205478.730950* 97735933.7759
FIRE:   21 16:58:12 120696159.140697* 88362144.9791
FIRE:   22 16:58:26 93282713.623994* 61992599.7514
FIRE:   23 16:58:36 72651462.505971* 36500942.0208
FIRE:   24 16:58:45 57246023.437164* 20437869.7095
FIRE:   25 16:58:54 45662582.876512* 12099065.5792
FIRE:   26 16:59:05 36822586.265060* 7870923.0160
FIRE:   27 16:59:16 29980243.758955* 5978081.3436
FIRE:   28 16:59:26 24620308.002318* 4625243.5480
FIRE:   29 16:59:36 20374927.156667* 3760195.3303
FIRE:   30 16:59:47 16976843.260064* 3383586.2006
FIRE:   31 16:59:57 14232645.791040* 2844068.3251
FIRE:   32 17:00:08 12002135.594942* 2246132.9324
FIRE:   33 17:00:16 10181591.290781* 1686825.0257
FIRE:   34 17:00:27  8691181.801153* 1223212.5462
FIRE:   35 17:00:37  7467313.904709*  869799.0612
FIRE:   36 17:00:47  6458267.971686*  687065.9220
FIRE:   37 17:00:56  5621706.450192*  646957.7375
FIRE:   38 17:01:07  4923195.983598*  699277.9362
FIRE:   39 17:01:18  4334927.815135*  710919.8298
FIRE:   40 17:01:28  3834969.030372*  677466.7610
[...]
FIRE:  10659 08:28:29  -161944.308658*      86.5190
FIRE:  10660 08:28:34  -161944.698209*      23.9075
FIRE:  10661 08:28:38  -161944.723300*      16.2688
FIRE:  10662 08:28:43  -161944.747467*       3.4779
FIRE:  10663 08:28:47  -161944.738042*       9.6879
FIRE:  10664 08:28:51  -161944.738963*       8.9168
FIRE:  10665 08:28:56  -161944.741948*       7.4401
FIRE:  10666 08:29:01  -161944.750354*       5.3877
FIRE:  10667 08:29:05  -161944.756265*       2.9594
FIRE:  10668 08:29:10  -161944.761592*       2.1633
FIRE:  10669 08:29:14  -161944.769636*       2.1633
FIRE:  10670 08:29:19  -161944.769154*       3.7667

Seeing that this implementation is written in pure Python/numpy I wondered if somebody had already implemented that for GPU (I remembered reading an OpenMM issue where @jchodera talked about it), and found the FIREMinimizationIntegrator you have developed.

I grabbed the file, pasted the class in a Notebook and ran it like this:

from simtk.openmm import app
from simtk import openmm as mm
from simtk import unit
from sys import stdout

prmtop = app.AmberPrmtopFile("my.prmtop")
inpcrd = app.AmberInpcrdFile("my.inpcrd")
system = prmtop.createSystem()
integrator = FIREMinimizationIntegrator(tolerance=0.0001 * unit.kilojoules_per_mole / unit.nanometers)
sim = app.Simulation(prmtop.topology, system, integrator)
sim.context.setPositions(inpcrd.positions)
print('Starting energy:', sim.context.getState(getEnergy=True).getPotentialEnergy())
sim.reporters.append(app.StateDataReporter(stdout, 100, step=True, 
    potentialEnergy=True, speed=True, separator='\t'))
while integrator.getGlobalVariableByName('converged') < 1:
    integrator.step(50)

The performance is impressive but stops minimizing very soon:

Starting energy: 251733888466944.0 kJ/mol
#"Step"	"Potential Energy (kJ/mole)"	"Speed (ns/day)"
100	2694168969216.0	0
200	2694168969216.0	8
300	2694168969216.0	7.9
400	2694168969216.0	7.94
500	2694168969216.0	7.94
600	2694168969216.0	7.95
700	2694168969216.0	7.94
800	2694168969216.0	7.94
900	2694168969216.0	7.96
1000	2694168969216.0	7.96
1100	2694168969216.0	7.95

If the system is created with nonbondedMethod=app.CutoffNonPeriodic, nonbondedCutoff=1.0*unit.nanometers options, then the energies decrease better, but far from ideal:

Starting energy: 251734224011264.0 kJ/mol
#"Step"	"Potential Energy (kJ/mole)"	"Speed (ns/day)"
100	2424489902080.0	0
200	1742063271936.0	48.9
300	1424046292992.0	49.5
400	849651040256.0	50.2
500	631096934400.0	47.1
600	447762726912.0	46.2
700	361159458816.0	46.7
800	251577466880.0	46.6
900	177901469696.0	46.8
1000	131495706624.0	47.1
1100	131495706624.0	47
1200	131495706624.0	46.8
1300	131495706624.0	45.9
1400	131495706624.0	45.7
1500	131495706624.0	45.8
1600	131495706624.0	45.7
1700	131495706624.0	45.7
1800	131495706624.0	45.6
1900	131495706624.0	45.6
2000	131495706624.0	45.7
2100	131495706624.0	46.7

I first thought that maybe the units are not comparable, so I checked the structure for bad contacts and, while it had decreased from 400 to 200, it still was not comparable to ase's result. I tried to couple it with OpenMM minimizer afterwards to see if the L-BFGS was able to get the energy down, and it got it to 54491.28125 kJ/mol during the night, but then FIRE was unable to improve it further. I also tried with different platforms (CUDA, OpenCL, CPU), but the result is the same.

I am not very familiar with the FIRE algorithm so I then thought it had to do with the chosen parameters, but apparently they are the same in both implementations (besides units):

# ASE
class FIRE(Optimizer):
    def __init__(self, atoms, restart=None, logfile='-', trajectory=None,
                 dt=0.1, maxmove=0.2, dtmax=1.0, Nmin=5, finc=1.1, fdec=0.5,
                 astart=0.1, fa=0.99, a=0.1, master=None, downhill_check=False,
                 position_reset_callback=None, force_consistent=None):
[...]

# YANK
class FIREMinimizationIntegrator(openmm.CustomIntegrator):
    def __init__(self, timestep=1.0 * unit.femtoseconds, tolerance=None, alpha=0.1, 
            dt_max=10.0 * unit.femtoseconds, f_inc=1.1, f_dec=0.5, f_alpha=0.99, N_min=5):
[...]

So that's it, I guess I am doing something wrong implementation-wise, but I really want to use the FIRE minimizer because it looks impressive! Thanks for your help, Jaime.

Nov 06 '18 10:11 jaimergp

I'm not sure, but I think that Yank is converged when the average force of the system is lower than a tolerance, while ASE checks the force for all atoms individually. I think yank also converges when the timestep becomes sufficiently small, but I can't spot the same check in the ASE code. If you're happy to email me your input I can check what the issue is!

Nov 06 '18 17:11 hannahbrucemacdonald

Thanks for your answer! Yes, I can send the files via email. Can you send me your address?

In the meantime, I will check the average forces, but I'd say that 0.0001 * unit.kilojoules_per_mole / unit.nanometer is low enough to avoid that stopping criterion. I think I have also tried with default tolerance (0, basically), and the results were similar (convergence with high energy).

Thanks again, Jaime.

Nov 06 '18 22:11 jaimergp

[email protected]

Hmm ok maybe it's not the forces then, it's just the only difference I spotted at first look. If you send me over the files I will have more of a hunt!

Thanks!

Nov 06 '18 22:11 hannahbrucemacdonald

Thanks for the reports! The FIRE minimizer implementation we use in YANK is still very crude and not at all fully validated yet, so please use it with caution. We should be able to look into this sometime soon! For now, I'd suggest using the OpenMM LocalEnergyMinimizer even if it's slow.

Nov 06 '18 22:11 jchodera

@hannahbrucemacdonald: [email protected]

Hmm ok maybe it's not the forces then, it's just the only difference I spotted at first look. If you send me over the files I will have more of a hunt!

Thanks!

Got it! Mail sent! Thanks!

@jchodera: Thanks for the reports! The FIRE minimizer implementation we use in YANK is still very crude and not at all fully validated yet, so please use it with caution. We should be able to look into this sometime soon! For now, I'd suggest using the OpenMM LocalEnergyMinimizer even if it's slow.

Thanks for the suggestion! We have been using LocalEnergyMinimizer successfully, but it is not working properly in this system (too bad clashes, probably). I was crossing my fingers and trying to see if maybe another minimizer was able to do at least something with the most severe clashes, and then go back to LocalEnergyMinimizer, but to my surprise ase's FIRE did a great work by itself. Having a minimizer implemented as an Integrator is also very useful for energy reporting, something people keep requesting for some reason (at least here in our lab). 😅

Nov 06 '18 23:11 jaimergp

yank yank copied to clipboard

FIRE integrator does not fully minimize this system; ASE does

yank
yank copied to clipboard