openff-interchange
openff-interchange copied to clipboard
describing pi orbitals for rings
benzene_trouble.tar.gz Describe the bug
I'm currently trying to run simulations using the Smirnoff Forcefield in combination the virtual sites generated by the Astex ESP DNN but am running into some problems.
I want to add virtual sites above and below the ring to describe the pi electrons (please see benzene_astex.pdb.pqr). These virtual sites should be along the vector perpendicular to the plane of the ring. To do this I was planning to use the TrivalentLonePair class. However, when I try to do this the position of the virtual sites appears to be along some different vector entirely (see attached). Am I defining the sites incorrectly somehow? I can see there might be an issue when all the atoms lie in a plane. Adding a virtual site using the TrivalentLonePair works fine for me when defining a site on ammonia.
Furthermore, even though I've got two different types of sites (one above and one below the plane), and have defined each of these with different names (EP+ and EP-), I'm only seeing sites in one position. Confusingly, it is adding two virtual sites for each carbon, one for each name, they just are added in the same positions (please see trajectory.pdb)
To Reproduce Please run python vsite_example.py benzene.sdf benzene_vsites.offxml
Sorry, vsite_example.py is a bit long, but it's basically just taken directly from the example notebook provided on the openff-toolkit github; I figured this was the best way to minimize the chance of me introducing any errors.
I'm running using openff-toolkit 0.11.4 , I wouldn't have thought this should be an issue as I have not seen anything relevant in the bugfix reports in more recent releases.
Output The output is trajectory.pdb.
Computing environment (please complete the following information):
- Operating system : ubuntu 24.04 LTS
- Output of running
conda list: I'm actually not using a conda environment, but the output of pip list is attached and in pyflare_env. The reason for this is that the company I work for (Cresset) has it's own python build (pyflare), but I've attached the output of pyflare -m pip list, which should be roughly equivalent.
Additional context I can work round this issue by using MonovalentLonePair and assigning the parent atom to be the hydrogen, but this isn't a viable solution for all systems to which I want to add this kind of virtual site.
Many Thanks,
Peter
Hi Peter, I'll have a look. Are you able to upload the script/notebook you're using? I think it's missing from the tarball:
$ tree benzene_trouble
benzene_trouble
├── benzene.pdb.pqr
├── benzene.sdf
├── benzene_vsites.offxml
├── pyflare_env
└── trajectory.pdb
1 directory, 5 files
Very sorry about that, the script should be attached. Sorry, it's tarred up because I don't seem to be able to attach just a .py script. Many Thanks.
Let me first try to answer a few questions I see in your comments
Don't understand this use of colons ...
These are type annotations, they're used to indicate the type of a variable and do nothing else at runtime (in fact they're completely ignored by the interpreter). They're useful for some tooling and, I think, clarifying the types of particular variables. I especially like to include them when we're using similarly- or identically-named classes from two libraries, i.e. opemm.app.Topology vs. openff.toolkit.Topology. There's lots more to read if you're curious, but you can safely ignore them if you wish.
so do all virtual sites have zero mass?
Yes. I suppose it's not explicit in the specification, only the toolkit docs which are useful but less authoritative.
is True equivalent to 1 in sum
Yes, you've come across one of the (many) quirks of truthiness in Python. You're summing over an expression of booleans, which functions as adding up the number of True values:
ipdb> sum([False, False])
0
ipdb> sum([True, False, True])
2
ipdb> sum([True])
1
But back to the behavior you're observing. You're right to be confused at how the distances are a mix of positive and negative in the force field:
ipdb> pprint([parameter.distance for parameter in forcefield["VirtualSites"]])
[<Quantity(0.89, 'angstrom')>, <Quantity(-0.89, 'angstrom')>]
but not represented as such in the Interchange object:
ipdb> pprint(interchange["VirtualSites"].potentials.values())
dict_values([Potential(parameters={'distance': <Quantity(0.89, 'angstrom')>, 'outOfPlaneAngle': None, 'inPlaneAngle': None}, map_key=None), Potential(parameters={'distance': <Quantity(0.89, 'angstrom')>, 'outOfPlaneAngle': None, 'inPlaneAngle': None}, map_key=None)])
which is why you're seeing unexpected things in the OpenMM simulation. In some places we have a convention of letting distances always be (numerically) positive and reversing the order of atom indices to enforce with which direction the virtual site sticks out of the plane. The canonical example of this is a TIP5P water:
In [1]: from openff.toolkit import Molecule, ForceField
In [2]: out = ForceField(
...: "/Users/mattthompson/software/openff-interchange/openff/interchange/_tests/data/tip5p.offxml"
...: ).create_interchange(Molecule.from_smiles("O").to_topology())
In [3]: [
...: potential.parameters["distance"]
...: for potential in out["VirtualSites"].potentials.values()
...: ]
Out[3]: [0.15 <Unit('angstrom')>] # always a positive value
In [4]: [key.orientation_atom_indices for key in out["VirtualSites"].key_map]
Out[4]: [(0, 1, 2), (0, 2, 1)] # two different atom orderings, so point in both + and - directions
Your code is producing a case in which the distances are both positive (above) and the atom indices are not stored with forward and reverse directions:
ipdb> pprint(
[key.orientation_atom_indices for key in interchange["VirtualSites"].key_map.keys()]
)
[
│ (0, 1, 5, 11),
│ (0, 1, 5, 11),
│ (1, 0, 2, 6),
│ (1, 0, 2, 6),
│ (2, 1, 3, 8),
│ (2, 1, 3, 8),
│ (3, 2, 4, 9),
│ (3, 2, 4, 9),
│ (4, 3, 5, 10),
│ (4, 3, 5, 10),
│ (5, 4, 0, 7),
│ (5, 4, 0, 7)
]
This might be a bug, possibly with deduplication logic. In any case, the toolkit is doing its job well and Interchange is mangling something, so I'm transferring this to that repo. I'll update this issue when I have a better diagnosis and fix.
Hi. Thanks very much for this detailed response. I think I understand it, but was actually on study leave today and won't be back work until Tuesday, so will go back over it then and might get back to you.
Sorry about the questions in the script comments, I'd forgot I'd left them in there; they're sort of me talking to myself....
I had another look at this. The parameters with positive and negative distance are getting smooshed together because the SMIRKS patterns are the same.
ipdb> pprint(forcefield["VirtualSites"].parameters)
[
│ <VirtualSiteType with smirks: [#6:1](~[#6:2])(~[#6:3])(~[#1:4]) epsilon: 0.0 kilocalories_per_mole type: TrivalentLonePair match: once distance: 0.89 angstrom outOfPlaneAngle: None inPlaneAngle: None charge_increment1: 0.01 elementary_charge sigma: 0.1 angstrom name: EP+ >,
│ <VirtualSiteType with smirks: [#6:1](~[#6:2])(~[#6:3])(~[#1:4]) epsilon: 0.0 kilocalories_per_mole type: TrivalentLonePair match: once distance: -0.89 angstrom outOfPlaneAngle: None inPlaneAngle: None charge_increment1: 0.01 elementary_charge sigma: 0.1 angstrom name: EP- >
]
ipdb> [parameter.smirks for parameter in forcefield["VirtualSites"]]
['[#6:1](~[#6:2])(~[#6:3])(~[#1:4])', '[#6:1](~[#6:2])(~[#6:3])(~[#1:4])']
ipdb> {parameter.smirks for parameter in forcefield["VirtualSites"]}
{'[#6:1](~[#6:2])(~[#6:3])(~[#1:4])'}
When the toolkit runs substructure matching, it only finds one solution so it only provides on parameter to Interchange. For most virtual site types, the solution is to use match="all_permutations" but this is not supported for TrivalentLonePair and it wouldn't do exactly what you're after.
Right now I'm not sure how to implement your use case, which should be support one way or another; we haven't come across this particular combination of chemistry and geometry. We'll think about it and update this issue later.
Just a ping on this one - we're at a fairly advanced stage of building a force field based on openFF + lots of virtual sites, and the inability to specify an out-of-plane virtual site on an sp2 atom is becoming difficult - we currently have to set up the system with the OpenFF tools and then fiddle with the OpenMM state directly, which is less than optimal. Are there any plans currently to extend the VS code to handle this type of system?
One of the underlying issues (https://github.com/openforcefield/openff-toolkit/issues/1363) is unresolved and @j-wags would be the point person on it
I can work around it by defining different SMIRKS patterns to re-orient the reference frames; the physics here (at least) are surely garbage but the OpenMM object defines multiple virtual sites in reference frames such that they should be pointing in opposite directions with respect to a C-C-C-H quartet that defines them.
import openmm
import openmm.app
import openmm.unit
from openff.toolkit import ForceField, Molecule, Quantity, Topology
benzene = Molecule.from_smiles("c1ccccc1")
benzene.generate_conformers(n_conformers=1)
if False:
topology = Topology.from_molecules(
[
benzene,
benzene,
]
)
topology.box_vectors = Quantity([4, 4, 4], "nanometer")
positions = topology.get_positions()
positions[int(positions.shape[0] / 2) :, :] += Quantity([0, 0, 0.3], "nanometer")
topology.set_positions(positions)
else:
topology = benzene.to_topology()
forcefield = ForceField("openff-2.1.0.offxml")
virtual_site_parameters = forcefield.get_parameter_handler("VirtualSites")
for smirks, name in zip(
[
"[#6:1](~[#6:2])(~[#6:3])(~[#1:4])",
"[#6:1](~[#6:3])(~[#6:2])(~[#1:4])",
],
[
"EP+",
"EP-",
],
):
virtual_site_parameters.add_parameter(
parameter_kwargs={
"smirks": smirks,
"epsilon": Quantity(0.0, "kilocalories_per_mole"),
"type": "TrivalentLonePair",
"match": "once",
"distance": Quantity(0.4, "angstrom"),
"outOfPlaneAngle": None,
"inPlaneAngle": None,
"charge_increment1": Quantity(0.1, "elementary_charge"),
"sigma": Quantity(0.0, "angstrom"),
"name": name,
},
)
interchange = forcefield.create_interchange(topology=topology)
interchange.minimize()
# something's wrong with the initial positions, but
interchange.to_pdb("initial.pdb", include_virtual_sites=True)
# the virtual site parameters here look good
with open("system.xml", "w") as f:
f.write(
openmm.XmlSerializer.serialize(
interchange.to_openmm(
combine_nonbonded_forces=True,
),
),
)
simulation = interchange.to_openmm_simulation(
integrator=openmm.LangevinIntegrator(
300,
1,
0.001,
)
)
dcd_reporter = openmm.app.DCDReporter("trajectory.dcd", 10)
simulation.reporters.append(dcd_reporter)
simulation.runForClockTime(1.0 * openmm.unit.second)
# pymol initial.pdb trajectory.dcd or similar ...
Thanks for the reply - we'll take a look and see how your suggestions works for us!
Just a quick comment to @mark-cresset on this:
Just a ping on this one - we're at a fairly advanced stage of building a force field based on openFF + lots of virtual sites, and the inability to specify an out-of-plane virtual site on an sp2 atom is becoming difficult - we currently have to set up the system with the OpenFF tools and then fiddle with the OpenMM state directly, which is less than optimal. Are there any plans currently to extend the VS code to handle this type of system?
It sounds like this is potentially a request for an extension to the SMIRNOFF spec, particularly our virtual sites: https://docs.openforcefield.org/projects/toolkit/en/0.5.0/smirnoff.html#virtualsites-virtual-sites-for-off-atom-charges
We do not support anything not specified in the spec, and there are no extensions to the spec that would affect this currently under consideration (this is the repo where we do that: https://github.com/openforcefield/standards/blob/main/docs/standards/smirnoff.md ). If you have a specific spec change/extension you would like to propose, we could be willing to consider it. But we're certainly not planning on supporting anything not in the spec. (The process is that FIRST things get put into the spec after our committee works through them to ensure the specification is complete, clear, etc., and then AFTER they are in the spec they get potentially implemented.)
I haven't dug into all the science on this. But if a spec change is what's being proposed, then ... we'd need to know what the proposed change/extension is.
Looking at this with fresh eyes, I think we might be talking about different issues.
I think the issue that Peter and Mark are reporting is that the virtual sites are in plane with the benzene ring, when they should be out of plane.
Matt correctly points out that trivalent vsites are tricky for a variety of reasons, and maybe the root cause of the benzene issue is one of those, but the clear first step is to understand why the trajectories have the vsites in-plane.
I'll chat with Matt tomorrow morning and will update here!
I chatted with Matt and we came to the consensus that we thought we'd fixed the wrong-geometry bug and so we thought you were talking about something else, but instead it looks like the machinery we were using to test our trivalent vsite geometries was testing the wrong thing, so the bug found its way back in.
So, we're actively working on a fix for this, and in the longer term we're going to discuss whether we need to update/publish a new version of our spec to clarify the behavior of trivalent vsites better. Thanks for your patience on this @peterjohncherry and @mark-cresset - You're the first major users of this functionality and we want to get a fixed turned around for you quickly.
I'm going to go ahead and close this issue (though please let me know if I've missed some facet that's unresolved!)
There were two discussions here, and they were resolved as follows:
- Cresset wants a way to model vsites above and below a ring - This was resolved by adding the
OPENFF_UNSAFE_VSITESflag in https://github.com/openforcefield/openff-toolkit/pull/1887 - TrivalentLonePair vsites don't lie above and below the plane when the 4 atoms are coplanar - This is a valid problem but TrivalentLonePair vsites were always intended to be applied to pyramidal atoms, so it's not clear if this is a bug here or something that should be changed in the SMIRNOFF spec. Thus this is more of a standards question than an Interchange issue.