openff-toolkit
openff-toolkit copied to clipboard
.to_openmm() issue/converting between formats
I'm trying to parameterise a small ligand using Parsley, to be exported as GROMACS files. However, I cannot convert OpenFF Topology object into an OpenMM topology:
AttributeError Traceback (most recent call last)
<ipython-input-68-10a568baf967> in <module>
13
14 # create omm system with the off topology
---> 15 omm_system = ff.create_openmm_system(omm_topology)
16
17 # Convert OpenMM System to a ParmEd structure.
~/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py in create_openmm_system(self, topology, **kwargs)
1110
1111 # Set periodic boundary conditions if specified
-> 1112 if topology.box_vectors is not None:
1113 system.setDefaultPeriodicBoxVectors(*topology.box_vectors)
1114
AttributeError: 'Topology' object has no attribute 'box_vectors'
Attaching the inputs + a Jupyter Notebook with the script. Conda installation, Ubuntu 18.04.
Thanks for getting in touch, @dlukauskis,
It looks like the attachments didn't work -- Could you try uploading them again?
Also, ff.create_openmm_system expects an Open Force Field Topology, not an OpenMM Topology. If you already have an OFF topology, you can just substitute it into that call. Otherwise, you can produce an OFF topology from an OMM topology using the openforcefield.topology.Topology.from_openmm function. Here is one example of that:
# Create OpenFF topology with 1 ethanol and 2 benzenes.
ethanol = Molecule.from_smiles('CCO')
benzene = Molecule.from_smiles('c1ccccc1')
off_topology = Topology.from_molecules(molecules=[ethanol, benzene, benzene])
# Convert to OpenMM Topology.
omm_topology = off_topology.to_openmm()
# Convert back to OpenFF Topology.
off_topology_copy = Topology.from_openmm(omm_topology, unique_molecules=[ethanol, benzene])
https://github.com/openforcefield/openforcefield/blob/master/openforcefield/tests/test_topology.py#L446-L465
Thanks for that, @j-wags. I've got past making parmed_structure, but now I'm getting a write issue:
AttributeError Traceback (most recent call last)
<ipython-input-4-16c2ff5159d2> in <module>
20 # Export GROMACS files.
21 parmed_structure.save('system.top', overwrite=True)
---> 22 parmed_structure.save('system.gro', overwrite=True)
~/anaconda3/envs/openff/lib/python3.7/site-packages/parmed/structure.py in save(self, fname, format, overwrite, **kwargs)
1483 s.write_psf(fname, **kwargs)
1484 elif format == 'GRO':
-> 1485 gromacs.GromacsGroFile.write(self, fname, **kwargs)
1486 elif format == 'MOL2':
1487 formats.Mol2File.write(self, fname, **kwargs)
~/anaconda3/envs/openff/lib/python3.7/site-packages/parmed/gromacs/gromacsgro.py in write(struct, dest, precision, nobox, combine)
295 _write_atom_line(
296 original_atom, atid % 100000,
--> 297 resid % 100000, has_vels, dest, precision)
298 new_molecule.add(original_atom)
299 last_found_atom = original_atom
~/anaconda3/envs/openff/lib/python3.7/site-packages/parmed/gromacs/gromacsgro.py in _write_atom_line(atom, atid, resid, has_vels, dest, precision)
236 dest.write('%5d%-5s%5s%5d' % (resid, atom.residue.name[:5],
237 atom.name[:5], atid))
--> 238 dest.write((crdfmt % (atom.xx/10))[:varwidth])
239 dest.write((crdfmt % (atom.xy/10))[:varwidth])
240 dest.write((crdfmt % (atom.xz/10))[:varwidth])
AttributeError: 'Atom' object has no attribute 'xx'
Sorry about the attachment, forgot it!
Ah, ParmEd is looking for coordinates on the atoms, but not finding them. You can provide those using the xyz keyword to parmed.openmm.load_topology:
parmed_structure = parmed.openmm.load_topology(omm_topology, omm_system, xyz=molecule.conformers[0])
Also, it looks like you're parametrizing octa-acid by subunit. However, this is being interpreted as a carboxylate, with -1 net charge:
for atom in molecule.atoms:
print(atom.formal_charge)
0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0
@slochower had encountered this problem before, though I don't think we found an elegant way around it. He resorted to setting partial_charges on the molecule itself, and then using the charge_from_molecules kwarg to create_openmm_system to avoid having to run AM1-BCC on the host molecule during system creation.
Host systems can be more painful than average to set up, so please keep posting about issues you might encounter, and we can rope in @slochower for pointers if we need.
Ah, ParmEd is looking for coordinates on the atoms, but not finding them. You can provide those using the
xyzkeyword toparmed.openmm.load_topology:parmed_structure = parmed.openmm.load_topology(omm_topology, omm_system, xyz=molecule.conformers[0])
Adding this did produce an output, a decent looking .top file but a badly-off .gro. Here. Is this the fault of the .sdf itself or Parmed?
Also, it looks like you're parametrizing octa-acid by subunit. However, this is being interpreted as a carboxylate, with -1 net charge:
for atom in molecule.atoms: print(atom.formal_charge)0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0
This is the OctaAcid guest, not the host itself. I thought I'd start small when using OpenFF for the first time.
@slochower had encountered this problem before, though I don't think we found an elegant way around it. He resorted to setting
partial_chargeson the molecule itself, and then using thecharge_from_moleculeskwarg tocreate_openmm_systemto avoid having to run AM1-BCC on the host molecule during system creation.
I'm entirely unable to de novo parameterise the organic hosts - Antechamber via Acpype fails to assign the partial charges on a host molecule, running out of memory after 3 hours... How did you get the partial charges in the first place? Parameterise one repeating unit in the polymer and then assume its the same throughout?
Which guest? This should be easy, I can get likely get you a working example.
Parameterizing hosts can be tricky, and I've not done this much myself. I think @slochower may do it by subunit (at least for charging) and then stitch them together. That said, the OpenEye toolkits are usually fast enough that they can charge the whole host, at least for octa acid (we've been doing it recently), so I would expect this to work OK. Indeed, though, antechamber doesn't work well for it -- the Antechamber sqm implementation seems to choke on molecules of this size. That doesn't help you, then, if you've not got an OpenEye license. If that's what you want, I might be able to charge it for you, or we may already have a charged copy around for the reference calculations we're doing. Are you looking at SAMPL7, or doing a historical SAMPL?
Which guest? This should be easy, I can get likely get you a working example.
In this specific case, its OA-G0 from SAMPL6 set.
Parameterizing hosts can be tricky, and I've not done this much myself. I think @slochower may do it by subunit (at least for charging) and then stitch them together. That said, the OpenEye toolkits are usually fast enough that they can charge the whole host, at least for octa acid (we've been doing it recently), so I would expect this to work OK. Indeed, though, antechamber doesn't work well for it -- the Antechamber
sqmimplementation seems to choke on molecules of this size. That doesn't help you, then, if you've not got an OpenEye license. If that's what you want, I might be able to charge it for you, or we may already have a charged copy around for the reference calculations we're doing. Are you looking at SAMPL7, or doing a historical SAMPL?
I've got the sdf and mol2 files for all of the guests and hosts in the test set, I was just looking for ways to use OpenFF without an OE license. It seems a lot easier to use it with OE and .mol2 files.
I'm approaching this to both teach myself how to use the OpenFF toolkit and comparing the Parsley results to GAFF2, using the small guest-host systems.
OK, so another option for you for the hosts (since Parsley just uses AM1-BCC charges and updates valence parameters) is to use the pre-charged systems we provided in the SAMPL6 repo, so load the charges from the input molecules rather than using the openforcefield toolkit to assign charges. Otherwise you're stuck needing either (a) an OpenEye license for host charge assignment, because antechamber chokes on the whole host, or (b) needing to break it up into subunits and parameterize each individually before stitching back together.
If you wanted to make your life REALLY easy you could also utilize the fully solvated/parameterized systems provided in the repo for the reference calculations (e.g. https://github.com/samplchallenges/SAMPL6/tree/master/host_guest/Reference) and just swap in Parsley parameters for the host and guest.
I'll try to get back to you with a worked example of parameterizing just g0. You're after output to GROMACS format, it looks like...
Apologies for the delay, I was traveling and encountered some unanticipated airline silliness. I quickly skimmed the thread, so I may have missed a few details, but here are my top level comments.
- I don't recall any problem generating partial atomic charges on OA using
antechamber, but I haven't looked at that host particularly since SAMPL5. (I'm currently sifting through old files, seeing if I can see exactly what we did for charges. Didn't find the right scripts in a quick look, I can loop back around later. For SAMPL5, I believe we used RESP charges.) - However, when
antechamberdoes take a long time, try adding the-plflag, e.g.,-pl 10. This option controls the "path length" thatantechamberuses to find symmetry in the molecule. In my experience, you can get away with quite good charges in a reasonable amount of time and memory by restricting the path length. - As @j-wags said, I usually pass charges directly using
charge_from_molecules. - I also agree with @davidlmobley that OpenEye tools can probably generate charges for OA.
- Generally speaking, happy to help with host-guest setup @dlukauskis although my Gromacs skills are rusty now.
@dlukauskis here's an example which works (except for note at bottom) for me with the RDKit backend (which, I assume, is what you're using) for the guest:
from openforcefield.topology import *
from openforcefield.typing.engines.smirnoff import ForceField
solute = Molecule.from_file('OA-G0.sdf')
off_topology = Topology.from_molecules(molecules=[solute])
ff = ForceField("openff-1.0.0.offxml")
system = ff.create_openmm_system(off_topology)
# Create OpenMM topology
omm_topology = off_topology.to_openmm()
import parmed
# Convert OpenMM System to a ParmEd structure.
parmed_structure = parmed.openmm.load_topology(omm_topology, system, solute.conformers[0])
# Export GROMACS files.
parmed_structure.save('system.top', overwrite=True)
parmed_structure.save('system.gro', overwrite=True)
One potential issue is that it looks like this results in a GROMACS topology file without atom names or a molecule name, which is (a) perhaps not surprising because SDF files don't have atom names, and (b) probably won't run without this being populated. That said, you should be able to add any string as an atom name and molecule name and have these work.
@j-wags is there an easy way to add atom names automatically so we won't have this issue? (Also, we should make sure we automatically populate atom and molecule/system names, otherwise we'll tend to have these kinds of issues on passing through ParmEd. I thought this was something we had an open issue on, or had already dealt with, but it seems not to be the case.)
I'm attaching my input file. OA-G0.sdf.zip
@j-wags @davidlmobley
# FIXME: OpenForceField does not provide residue names!
structure.residues[0].name = "RES"
# FIXME: OpenForceField does not provide atom names!
for index, atom in enumerate(structure.atoms):
atom.name = f"{atom.element_name}{index}"
Also, please check that the 1-4 terms are correct (e.g., SCEE and SCNB). These are not populated correctly for AMBER using ParmEd:
# FIXME: ParmEd does not assign scee and scnb to the dihedrals
# If this block is commented out, energies won't match between
# ParmEd and tleap -generated PRMTOPS! I guess this is a bug in
# ParmEd because the generated FRCMOD files are incorrect.
if with_fixes:
for dihedral in structure.dihedral_types:
dihedral.scee = 1.2
dihedral.scnb = 2.0
Oooh, good catch, @slochower -- I get:
[ defaults ]
; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ
1 2 no 1 0.83333
should be:
[ defaults ]
; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ
1 2 yes 0.500000 0.833333
or
[ defaults ]
; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ
1 2 no 0.500000 0.833333
I don't think this ParmEd issue is triggered in all situations -- e.g. @dfhahn has validated energies after conversion to GROMACS format and ensured they reproduce quite precisely -- but I think it's likely his cases are rescued because a portion of the system (in his case the protein) uses a different force field, or the input format is different, or both.
cc #104
* However, when `antechamber` does take a long time, try adding the `-pl` flag, e.g., `-pl 10`. This option controls the "path length" that `antechamber` uses to find symmetry in the molecule. In my experience, you can get away with quite good charges in a reasonable amount of time and memory by restricting the path length.
-pl 10 flag fixed the issue of parameterising the guests.
@davidlmobley the script you provided worked, but as you mentioned, the missing atom and residue names prevent it from running. I guess we can keep this issue open until that is resolved.
@dlukauskis you should be able to do a manual fix of the topology file to resolve those issues (the actual parameters should be correct), but yes, let's leave this open for now.
A number of things have changed in the past almost-three years:
- The OpenFF Toolkit is capable of passing residue information onto downstream formats, provided the input data contains residues and/or they are specified via the setters
- We are moving to use Interchange for interoperability tasks - see
ForceField.create_interchangeandInterchange.to_topas key methods
I'm sure we'll face more interoperability hurdles, but I'm closing this one as old. Please feel free to open an new issue on this or Interchange's tracker when something like this comes up again.