mbuild
mbuild copied to clipboard
Improper atom order incorrect in LAMMPS data writer for harmonic impropers?
Bug summary
It appears that the atom ordering for harmonic impropers is incorrect in the LAMMPS data writer. Here is the code that writes the impropers. Note that the atoms are written so that the atom1.idx is written in the third position.
# Dihedral data
if impropers:
data.write("\nImpropers\n\n")
for i, improper in enumerate(impropers):
data.write(
"{:d}\t{:d}\t{:d}\t{:d}\t{:d}\t{:d}\n".format(
i + 1,
improper_types[i],
improper[2],
improper[1],
improper[0],
improper[3],
)
)
However, LAMMPS specifies that the central atom should be first see docs here, and parmed
stores the central atom first see docs here.
Am I missing something, or are we reordering these atoms when we should not be.
I can't really remember how I came to that conclusion in an old PR
@ahy3nz thanks for hopping in here. I guess we need to make some simple test cases to throw through parmed
. The parmed docs are either wrong, or something has changed.
Unless someone beats me to the punch, I'm going to create a test case with formic acid. Its a small and simple molecule. I'll make up a fake improper if I have to and I'll create two XMLs; one with a harmonic improper and one with an amber-style improper. Then we can look at the parmed.Structure
and also write to LAMMPS, GROMACS, Cassandra, (and others?) and see what happens. 😬 😬 😬
Impropers are a complete and utter mess. Everybody does it a little differently, everybody adds their own quirks, nobody can agree on the right ordering. I can go on but we're all familiar with the idea.
I don't trust anything short of running an MM single-point energy and isolating just the improper contribution to the overall potential energy. Absolutely tedious but I don't think there's any shortcuts to be had here. LAMMPS and GROMACS make it feasible enough to decompose potential energy contributions (with some file parsing), and I'm assuming Cassandra does as well. CHARMM might, Amber probably does, but OpenMM doesn't.
In fact, OenMM's grouping of propers and impropers has probably led to some confusion in Foyer - are there impropers here? I don't know. ParmEd will probably find some, and plausibly be correct about them, but it's hard to verify them without understanding the source of the data (which I do not).
Impropers are a complete and utter mess.
Agreed, but we're stuck with them 👎 .
Here is a gist with an XML file that has an amber-style improper and a python script for formic acid. Please don't take it too seriously. Its a bit of a frankenstein file, but will work for this purpose. Am I being dumb or can we even specify harmonic impropers with foyer
?
~~All I've looked at so far is writing parmed
-> gmx
. The central atom (c
) appears to be written 3rd. Can anyone find a reliable source to say whether that is correct or not for gromacs? I didn't see it mentioned in the reference manual, although I feel like it must be there. Here and here is a start. This thread from the mailing list seems to say central atom should be first?~~
Update: For formic acid and with amber-style impropers parmed
appears to be storing the central atom third in the improper. This is consistent with the parmed
docs for amber-style impropers here.
import foyer
import mbuild
import parmed
mol = mbuild.load("C(=O)O", smiles=True)
ff = foyer.Forcefield("improper_amber.xml")
mol_ff = ff.apply(mol)
mol_ff.save("test.gro", overwrite=True)
mol_ff.save("test.top", overwrite=True)
impropers = [d for d in mol_ff.dihedrals if d.improper]
print(impropers[0].atom3.idx) # this atom is index 0, the central atom.
# Just to double check that loaded parmed from file gives the same
# result as using foyer -> parmed
mol_loaded = parmed.load_file("test.top", xyz="test.gro")
impropers = [d for d in mol_loaded.dihedrals if d.improper]
print(impropers[0].atom3.idx) # this atom is index 0, the central atom.
So for amber-style impropers, parmed
definitely storing central-atom-third. This is consistent with the docs.
The open question is then for harmonic impropers. The parmed docs say central atom first. I'll update when I learn more.
Update: I've done a simple test. Take a gro/top/mdp file. Remove the proper dihedrals. Copy the improper dihedral -- switch it from an "improper" to a "proper" -- type 4 to type 1. Keep the atom ordering the same in both, and compute the energy. The energy of the two dihedrals ("proper" or "improper") is the same when the central atom is 3rd. When the central atom is moved to the first position for the improper, the energy changes.
Files here.
Just to add more options for thinking about this... here is a zip with a minimal working example in LAMMPS. LAMMPS very clearly specifies central atom first. The energy it gives is: 0.0678461708 kJ/mol when the central atom (c, 1-indexed atom 1) is placed first. This more or less agrees with the "flipped" energies from GROMACS.
First, apologies if anyone is confused by this thread. I have removed a few earlier comments because I felt that they were unnecessarily complicating the matter.
Harmonic impropers: I am going to compare manually computed energies with LAMMPS and GROMACS.
parmed
preserves the harmonic improper order when reading/writing gmx
files. For example, if the topology has an improper with the atoms listed in order 1 4 3 2
, then the atom1.idx
, atom2.idx
, etc, in parmed
have the same order. You can easily confirm this with the python script attached as part of gromacs_debug.tar.gz
.
Next, if we trust the parmed docs, we know the central atom should be first. Then, we can calculate the energy of the improper. We can also flip the first and third atom and re-compute the energy.
Next, we can compute the energies of the original and flipped topologies with gmx
. We can compare the energies between our values and theirs.
Finally, we can compute the improper energy in LAMMPS, which clearly defines central atom first.
Summary:
LAMMPS, GROMACS, and the manual calculation show very similar energies when central atom is first. This suggests that the parmed
documentation is correct. It also suggests that our LAMMPS writer is incorrect. Please let me know if there is an error in my logic.
gromacs_files.tar.gz lammps_files.tar.gz
Here are the results from the manual calculation:
(1-indexed) first atom in parmed improper is: 1
Harmonic improper energy is 0.06784616933084593 kJ/mol
(1-indexed) first atom in parmed improper is: 3
Harmonic improper energy is 20.18619305350575 kJ/mol
From GROMACS with this topology:
[ dihedrals ]
; ai aj ak al funct c0 c1 c2 c3
1 4 3 2 2 10.0000000 4.6024000
the energy is: Improper Dih. 0.0723808 -- 0 0 (kJ/mol)
For the flipped topology:
[ dihedrals ]
; ai aj ak al funct c0 c1 c2 c3
3 4 1 2 2 10.0000000 4.6024000
the energy is: Improper Dih. 20.1864 -- 0 0 (kJ/mol)
.
The energy from lammps is: 0.0678461
for the following impropers:
Impropers
1 1 1 4 3 2
@ahy3nz I agree with most of the observations in your comment. However, I disagree with:
Parmed Structure: A parametrized Structure will list the central atom as the third atom for ALL impropers.
As far as I can tell, the CHARMM-style (harmonic) impropers are stored central-atom-first. I make this statement based upon reading/writing GROMACS files, the documentation, and a comparison of the computed energies between by-hand, LAMMPS, and GROMACS.
Similar test case as above but for Amber improper dihedrals. I don't get agreement in improper energies between gromacs and lammps unless I flip the 1st and 3rd atom of the improper in gromacs (flipped.top
).
I updated the above example. There is now a test_periodic.py
script that builds a system from scratch with mosdef and uses the resulting parmed structure to calculate the improper energies. Then a comparison is made with the top files.
Sounds like the lammpswriter probably needs to update the ordering then. In hindsight, I'm really curious what I had seen 3 years ago that made me incorrectly think the parmed convention was central-atom-third.
@ahy3nz in 2019 I came to the conclusion that parmed had the central atom first for harmonic impropers but third for periodic impropers. I didn't link to any part of the code or docs though :/
@ahy3nz I'm pretty convinced for the harmonic impropers.
I'm still struggling to figure out the amber-style impropers. GROMACS energies match with by-hand calculations (shout out to @rmatsum836) if you use central-atom-first. But the question still remains, does amber specify central-atom-third. And I lean towards yes on that. The main reason is this: for amber-style impropers, you should be able to use the same functional form as proper dihedrals, but just drop in the central atom as the third atom. If you do that in gmx
and compute the energies with funct 1
(periodic) vs. funct 4
(periodic improper) the energies are the same with the same atom ordering.
@lilyminium is correct. ParmEd correctly maintains the order of Amber impropers (central-atom third) so we should be able to keep the atom order from ParmEd dihedrals when writing out to LAMMPS.
I think this foyer test might be fishy then. The central atom is well-defined but the resultant parmed improper puts that central atom as the 3rd element in the pmd.improper object?
I think this foyer test might be fishy then. The central atom is well-defined but the resultant parmed improper puts that central atom as the 3rd element in the pmd.improper object?
Thanks for bringing this to my attention. This is why I opened up the thread rather than just dropping in a fix immediately. I'll let you know what I find.
@ahy3nz I took a look at the test. I agree--central atom definitely ends up third in the parmed.Structure
. Is something going wrong in the openmm
-> parmed
conversion then?
What is interesting is that the central atom (_CL
) is listed first in the XML, which I think correctly follows the OpenMM spec for impropers. But then something gets switched around by the time it gets to parmed
. The conversion doesn't seem to do any re-ordering.
It appears the atom ordering switching happens when foyer creates the openmm system, BEFORE any conversion-to-parmed happens. Specifically, when creating the customtorsionforce object via the customtorsionforcegenerator, there is some interesting improper-matching logic implemented in openmm.
Consequently, the order of the atoms in the improper changes depending on the presence of wildcards.
At the very least, I think this atom-ordering-improper issue stems in how openmm creates the customtorsionforces, and our writers try to be consistent with those, but this improper-with-wildcards is an interesting situation. Maybe the a test similar to test_charmm_improper
should be done with no wildcards in the custom torsion xml
Specifically, when creating the customtorsionforce object via the customtorsionforcegenerator, there is some interesting improper-matching logic implemented in openmm.
I may be misinterpreting the code, but it looks like for charmm
the ordering is central-atom-third if there are wildcards but central atom first if not? That seems very odd, but I may be misinterpreting the code. amber
is always central-atom-third, as you would expect.
That's what I had gathered from the code, central-atom changes if there are wildcards. I don't know why, and I don't know how that logic was determined.
If this openmm logic is the "right" way to treat these customtorsionforce elements in an XML, then this might mean one of two things?
-
The charmm_cooh.xml I used is actually a malformed XML element with a wildcard, and a correctly-written XML element will be processed by openmm to yield an improper that always puts central atom first?
-
If the charm_cooh.xml is correctly written, then the resultant writers will need to adjust the atom ordering of the parmed improper objects based on XML element (wildcard or not)?
We need to resolve this issue (only a year later). This is a pretty important one to anyone using these systems. I'm marking this as high priority, and bringing it up at the next developers meeting.