openmmforcefields icon indicating copy to clipboard operation
openmmforcefields copied to clipboard

Discrepancy in espaloma energies when using template generator, which calls regenerate_impropers

Open diogomart opened this issue 3 months ago • 4 comments

Hello,

I get different energies from OpenMM systems built with

  • espaloma.graphs.deploy.openmm_system_from_graph()
  • EspalomaTemplateGenerator

There are differences in relative energies between conformers of the same molecule. For a large small molecule (lenacapavir) the difference in relative energy between conformers was about 1.5 kJ/mol. For a few other molecules the difference was negligible (a few hundreds of a kJ/mol).

The only difference I could find in the code paths is a call to regenerate_impropers https://github.com/openmm/openmmforcefields/blob/839f225e00c58c50bad19f72d97a15964befed3f/openmmforcefields/generators/template_generators.py#L1893

With the template generator, which calls regenerate_impropers(), the number of impropers is halved, and the central atom is the first of the four atoms defining the dihedral. With espaloma.graphs.deploy.openmm_system_from_graph(), which does not call regenerate_impropers(), the central atom is the second atom.

See code attached, which can be run with python builder.py lenacapavir.sdf and prints the following

espaloma.__version__='0+untagged.16.g804d3cc.dirty'
openff.toolkit.__version__='0.14.5'
openmm.__version__='8.2'
openmmforcefields.__version__='0.14.1'
rdkit.__version__='2024.03.4'

espaloma system
len(bonds)=102
len(angles)=185
len(propers)=1620
len(impropers)=312
central_atoms={'b'}

generator system
len(bonds)=102
len(angles)=185
len(propers)=1620
len(impropers)=156
central_atoms={'a'}

energy diff (espaloma) 36.61758804321289 kJ/mol
energy diff (generator) 35.141313552856445 kJ/mol

espaloma-deploy-vs-generator.zip

diogomart avatar Oct 16 '25 17:10 diogomart

Can this be reproduced with a simpler molecule? It would make the system-to-system comparisons easier

mattwthompson avatar Oct 17 '25 14:10 mattwthompson

I see a difference of 0.3 kJ/mol for frag3.sdf.txt and 0.15 kJ/mol for frag4.sdf.txt. All other fragments of lenacapavir that I tried had negligible differences, like 0.0001 kJ/mol.

Differences seem to arise for nitrogens that have both aniline and sulfonamide character. I get a ~1.5 kJ/mol difference for aniline-sulfonamide.sdf.txt.

diogomart avatar Oct 17 '25 16:10 diogomart

I've finally been able to take a look into this. There are two things going on here.

First, the reason for half as many impropers coming out of openmmforcefields as espaloma is related to espaloma apparently normally using 6-fold impropers (6 improper terms per quadruple of atoms participating in any one improper) whereas OpenMM only supports up to 3-fold impropers with the scheme used by SMIRNOFF. This is discussed here (the picture shows why the central atom shows up in a different place) and here. Having impropers in different configurations like this will lead to slightly different energy values for configurations but shouldn't qualitatively affect their purpose of keeping things planar.

The history of all of this is slightly unclear to me, as this espaloma plugin for openmmforcefields predates my time working on this package, and I don't know why this never got implemented. Perhaps someone with more knowledge of espaloma's development could chime in here to clarify this. The behavior, if correct, would at least be good to document in openmmforcefields.

Second, however, the large energy differences you are finding on some systems are definitely due to a bug. For your aniline-sulfonamide.sdf example, I am getting the following impropers out of espaloma without regenerate_impropers:

   i    j    k    l    n            phi              k
------------------------------------------------------
   1    4    5   14    1   0.000000e+00   1.348143e+00
   1    4    5   14    2   0.000000e+00   4.817511e+00
   1    4   14    5    1   0.000000e+00   4.101055e-01
   1    4   14    5    2   0.000000e+00   1.393630e+00
   1    5    4   14    1   0.000000e+00   9.673239e-03
   1    5    4   14    2   3.141593e+00   7.079267e-03
   1   14    4    5    1   0.000000e+00   9.673239e-03
   1   14    4    5    2   3.141593e+00   7.079267e-03
   5    1    4   14    1   0.000000e+00   9.673239e-03
   5    1    4   14    2   3.141593e+00   7.079267e-03
   5    4    1   14    1   0.000000e+00   9.673239e-03
   5    4    1   14    2   3.141593e+00   7.079267e-03

and the following out of openmmforcefields using espaloma with regenerate_impropers:

   i    j    k    l    n            phi              k
------------------------------------------------------
   1   14    5    4    1   0.000000e+00   9.673239e-03
   1   14    5    4    2   3.141593e+00   7.079267e-03
   4    1    5   14    1   0.000000e+00   9.673239e-03
   4    1    5   14    2   3.141593e+00   7.079267e-03
   4   14    1    5    1   0.000000e+00   9.673239e-03
   4   14    1    5    2   3.141593e+00   7.079267e-03

You can see that most of the force constants k are small, but unlike all of the other impropers in the system, a few of them are large, and these are discarded in the conversion from 6-fold to 3-fold impropers.

I've verified that these are the values that come out of espaloma directly when a System is generated within openmmforcefields, so this is not an issue with openmmforcefields' extraction of a ForceField or caching. This seems like a bug in espaloma's regenerate_impropers. Someone more familiar with it should confirm or refute my suspicions though.

epretti avatar Nov 12 '25 23:11 epretti

There's this footnote in the espaloma paper

** Here, we use the threefold improper formulation used by the Open Force Field “Parsley” generation force fields, which avoids the ambiguity associated with selecting a single arbitrary improper torsion from a set of four atoms involved in the torsion

Maybe calling regenerate_impropers is the intended way of deploying espaloma, and the espaloma README and docs are misleading? @jchodera @yuanqing-wang

diogomart avatar Nov 14 '25 02:11 diogomart