openmmtools icon indicating copy to clipboard operation
openmmtools copied to clipboard

Issue with Hydration Free Energies with OpenMM & OpenMMTools

Open nickjbrowning opened this issue 4 years ago • 9 comments

Hey guys,

I was recommended to post this issue here by @jchodera.

I've been trying to use openmmtools' alchemy.py code to compute solvation free energies of an ethanol molecule in water. The main issue is that the dV/dl curves that alchemy.py is producing are significantly different from those in the literature, and the resulting free energies are incorrect. They're also not particularly smooth, either (see figure 1 on the github page).

I've created a repository with all of my code + a workflow of the issues I've encountered here:

https://github.com/Exarctus/OpenMMSolvation/tree/master/OpenMMSolvation

Some additional points/notes:

  1. I think that adding lambda-dependent energy derivatives should be automatically included in alchemy.py when addGlobalParameter is called in the force creation function, makes codes a little cleaner since you'll typically be interested in dU/dl values.

  2. Is there any cleaner way to create and use non-standard residues than having to load the input structure twice (once to parameterize the charges, second to create the solvated topology)?

  3. Since solvation free energies are one of the most common benchmarks with classical MD, would it be worth adding a simple tutorial to the openmm website? If I can get this code working I'd be happy to help.

nickjbrowning avatar May 30 '20 09:05 nickjbrowning

@jchodera

The issue posted by @askusay here: https://github.com/choderalab/openmmtools/issues/459 might also be related.

nickjbrowning avatar May 31 '20 09:05 nickjbrowning

Hey @Exarctus I'm working on a similar project. Can't run your script at the moment for not having all the packages, but just thought I would chime in with a debugging idea.

There's a fast and simple script that doesn't use openmmtools at http://openmm.org/tutorials/alchemical-free-energy/ - how about changing your script to use the same system and annihilate a single particle? One reason to do this is that the script on that page uses pymbar.

Better yet, swap that script to use your TI approach and see if you get the same results.

If you can't get good agreement on that simple system, then it might help to determine whether the discrepency is somewhere in the TI code, somewhere in units conversion, somewhere in openmmtools electrostatics, or somewhere in openmmtools LJ interactions. If it all proceeds well with the lennard jones fluid, than maybe level it up to a waterbox and if there's then a discrepancy, that would indicate the issue is with electrostatics, for example.

If everything goes perfectly up to that stage (i.e. all methods agree), then I think the only explanation left is the parameterization of the ethanol.

ljmartin avatar Jun 03 '20 01:06 ljmartin

Hey @ljmartin,

I'm not sure if it's simple as a single issue. Both the electrostatics and vdW components are incorrect - you can see this by comparing the figure in the repository vs figure 6 in the paper I linked (https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.21909), and also compare the dG print-outs in the repository vs. the electrostatic vs. steric decomposition the same authors list in table 2. Note that in the paper results are reported in kcal/mol, but the results I've reported here are in kJ/mol (for reference 1 kcal/mol = 4.2 kJ/mol).

I've additionally tried switching the code onto using a flexible water box from openmmtools.testsystems, and the results are equally crazy:

'''
---SYSTEM PREPARATION---
'''

waterbox = FlexiblePMEWaterBox()
[system, topology, positions] = [waterbox.system, waterbox.topology , waterbox.positions]

'''
---FINISHED SYSTEM PREPARATION---
'''

'''
---ALCHEMICAL CONFIGURATION---
    define solute indexes, set up the alchemical region + factory, and specify steric/electrostatic lambda coupling for solvation
'''
factory = alchemy.AbsoluteAlchemicalFactory(alchemical_pme_treatment='direct-space')

# Want to retain alchemical-alchemical nonbonded interactions as these are included in the decoupled endpoint solute + solvent system
alchemical_region = alchemy.AlchemicalRegion(alchemical_atoms=[0, 1, 2], annihilate_electrostatics=False, annihilate_sterics=False)
alchemical_system = factory.create_alchemical_system(system, alchemical_region)

# specify that we want to take energy derivatives on-the-fly with respect to both electrostatic and steric lambdas
for force in alchemical_system.getForces():
    if (force.__class__ == openmm.CustomNonbondedForce or force.__class__ == openmm.CustomBondForce):
        for i in range(0, force.getNumGlobalParameters()):
            if (force.getGlobalParameterName(i) == "lambda_electrostatics"):
                force.addEnergyParameterDerivative('lambda_electrostatics')
            elif (force.getGlobalParameterName(i) == "lambda_sterics"):
                force.addEnergyParameterDerivative('lambda_sterics')

alchemical_state = alchemy.AlchemicalState.from_system(alchemical_system)

'''
---FINISHED ALCHEMICAL CONFIGURATION---
'''

Here's a printout of the dG values:

ELECTROSTATICS
lambda <dV/dl>
 1.00 -40.01
 0.89 -31.90
 0.78 -23.20
 0.67 -17.31
 0.56 -15.64
 0.44 -8.53
 0.33 -6.75
 0.22 -2.85
 0.11 -2.22
 0.00  0.73
dG electrostatics, -14.227440054766408

STERICS
lambda <dV/dl>
 1.00  1.13
 0.95  1.40
 0.89  1.96
 0.84  1.44
 0.79  3.16
 0.74  4.57
 0.68  4.28
 0.63  9.01
 0.58 10.95
 0.53  7.77
 0.47 17.50
 0.42 42.33
 0.37 24.65
 0.32 47.01
 0.26 26.94
 0.21 19.94
 0.16 30.40
 0.11  3.88
 0.05 -5.72
 0.00  9.38
dG sterics, 13.512403183881993
dG -0.7150368708844148

and the dv/dl curves:

dVdL Waterbox

Again if you compare against the water curves + decomposition in the above reference, you can see that both steric and electrostatic contributions are incorrect. The steric contributions even have a oddly-shaped maximum when they should have a minimum.

Note: the definition of lambda in alchemy.py is (1-\lambda) what the above paper uses (the end-points are switched).

nickjbrowning avatar Jun 04 '20 11:06 nickjbrowning

Has anyone managed to test my code-sample yet?

nickjbrowning avatar Jun 09 '20 09:06 nickjbrowning

There's a fast and simple script that doesn't use openmmtools at http://openmm.org/tutorials/alchemical-free-energy/ - how about changing your script to use the same system and annihilate a single particle? One reason to do this is that the script on that page uses pymbar.

Just a quick comment; that example script has a bug in it and I have emailed John about it.

Around the beginning of the year, I was experimenting with the LJ example, alchemical-example.py, on http://openmm.org/tutorials/alchemical-free-energy/ . I was trying help a student with some examples of free energy calculations within OpenMM. The example would sometimes work, sometimes fail with the pymbar error:

 Failing with “pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1.

I had assumed that this was a sampling issue and/or some unknown bug in pymbar and this assumption was reinforced by this recent issue in Yank discussed here https://github.com/choderalab/yank/pull/1216 .

However, recently, I decided to revisit this and I discovered that sometimes, the system's potential energy would be massive. This was stochastic over many runs. I had assumed the single precision floating point representation was the problem in the beginning.

But.... digging in, I discovered a possible bug in the alchemical-example.py script:

    nbforce.setParticleParameters(index, charge*0, sigma, epsilon*0)

should be:

    nbforce.setParticleParameters(index, charge*0, sigma, epsilon)

This solves the issue.

mjw99 avatar Jun 09 '20 13:06 mjw99

Hi @Exarctus . I see here that you are using the 'direct-space' option to model PME. You are probably doing it because it is might not possible to compute the derivatives w.r.t. lambda_electrostatics with 'exact' (I'm not sure if OpenMM added the feature to compute parameter derivatives with offset parameters in NonbondedForces), but that ignores the reciprocal space of PME, and, unless that publication uses a similar method, I think it will have a non-neglibigle impact on the free energy profile. Direct space is intended to work in combination with a reweighting procedure at the end states. You might get better results with 'coulomb', although it is less tested.

Also, I'd check that the softcore formulation used by that paper is identical to that used in alchemy and you are using the same softcore parameters. By default, the softcore for electrostatics is turned off in alchemy.py, and I don't see it turned on in any part of your script (apologies if I missed it).

Finally, if the parameters/charges are different, you might get also a slightly different answer. Although, if they are approximately the same I wouldn't expect to affect the total trend. If you have the input files for that publication that would be best.

andrrizzi avatar Jun 12 '20 09:06 andrrizzi

@andrrizzi

thanks for the reply!

Yeah I wasn't quite sure which was the best approach - using alchemical_pme_treatment='exact' indeed doesn't provide derivatives wrt. lambda_electrostatics. I tried switching alchemical_pme_treatment='coulomb' using PME as the non-bonded method, and it gives the following pretty crazy values:

ELECTROSTATICS
lambda <dV/dl>
 1.00 -3003.52
 0.89 -2900.67
 0.78 -2673.00
 0.67 -2569.97
 0.56 -2319.89
 0.44 -2010.37
 0.33 -1697.57
 0.22 -1338.50
 0.11 -812.99
 0.00 -8.58
dG electrostatics, 1980.9994388786245

Also, I'd check that the softcore formulation used by that paper is identical to that used in alchemy and you are using the same softcore parameters. By default, the softcore for electrostatics is turned off in alchemy.py, and I don't see it turned on in any part of your script (apologies if I missed it).

In the paper they only use softcore electrostatics for the single-step procedure (openMM defaults softcore_beta = 0.0 so wouldn't work here off-the-bat). For the two-step procedure which is what I'm comparing against here, they use a linearly-scaled coulombic interaction (same as openMM), and a softcore lennard-jones which as far as I can tell is identical to that implemented in alchemy.py (albeit with 1-lambda in place of lambda):

reff_sterics = sigma*((softcore_alpha*(1.0-({0}))^softcore_b + (r/sigma)^softcore_c))^(1/softcore_c);

where alchemy.py defaults softcore_alpha, softcore_b and softcore_c to 0.5, 1.0, and 6.0 respectively.

but that ignores the reciprocal space of PME, and, unless that publication uses a similar method, I think it will have a non-neglibigle impact on the free energy profile. Direct space is intended to work in combination with a reweighting procedure at the end states.

In the paper, the author writes:

"In the particle‐mesh Ewald treatment of electrostatics, soft‐core potentials are used only for the direct sum part, because only short‐range interactions give rise to the “singularity” problem. The reciprocal contribution to electrostatic interactions is scaled linearly with λ."

I had assumed that alchemy.py automatically scales the reciprocal space contribution by lambda... Could you give an example of how I could successfully reweight the end states to circumvent this?

I'd also like to mention I've tried using a reaction field:

modeller.addSolvent(system_generator.forcefield, model='tip4pew', padding=12.0 * unit.angstroms)
system = system_generator.forcefield.createSystem(modeller.topology, nonbondedMethod=CutoffPeriodic, nonbondedCutoff=9.0 * unit.angstroms, constraints=HBonds)

alchemical_region = alchemy.AlchemicalRegion(alchemical_atoms=solute_indexes, annihilate_electrostatics=False, annihilate_sterics=False)
alchemical_system = factory.create_alchemical_system(system, alchemical_region)

and this also seems to produce erroneous dv/dl values:

ELECTROSTATICS
lambda <dV/dl>
 1.00 -650.19
 0.89 -719.52
 0.78 -651.64
....

Finally, there are some pretty stark differences between when one uses the GPU with the above code, vs the CPU. For a single lamba point, using PME and a direct-space alchemical treatment, the two different platforms give completely different dv/dl values:

For lambda_electrostatics = 0.5:

CPU

platform = openmm.Platform.getPlatformByName('CPU')
ELECTROSTATICS
lambda <dV/dl>
 0.50 -12.40

GPU

platform = openmm.Platform.getPlatformByName('OpenCL')
platform.setPropertyDefaultValue('Precision', 'mixed')
ELECTROSTATICS
lambda <dV/dl>
 0.50 -6.79

Both of the above are using reasonable sampling times of 0.5ns so sampling isn't an issue.

This is a little concerning to me!

nickjbrowning avatar Jun 12 '20 13:06 nickjbrowning

I had assumed that alchemy.py automatically scales the reciprocal space contribution by lambda... Could you give an example of how I could successfully reweight the end states to circumvent this?

Unfortunately, no. The reciprocal space is ignored completely in alchemy.py because it is impossible at the moment to implement a force using FFT with a custom force in OpenMM. Using reweighting can help only to correct the end states (we use the nonalchemical system to reweight the endpoints of the alchemical transformation) but it cannot be used to recover the whole free energy profile. Also, we found that this is robust only if the alchemical region is neutral. When it has a net charge, the overlap is very poor.

Finally, there are some pretty stark differences between when one uses the GPU with the above code, vs the CPU.

This is indeed weird. Are they different also when you compare single structures?

andrrizzi avatar Jun 18 '20 19:06 andrrizzi

Has there been any progress on this issue?

If not, could I follow the approach of perses and scale the reciprocal part of the electrostatics like what is done there (if I understand the code correctly)? For example, _create_reciprocal_space_force looks to do part of what is required to enable linear scaling of the reciprocal part of the electrostatics

chem-william avatar Dec 13 '23 09:12 chem-william