openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

.to_openmm() issue/converting between formats

Open dlukauskis opened this issue 6 years ago • 16 comments

I'm trying to parameterise a small ligand using Parsley, to be exported as GROMACS files. However, I cannot convert OpenFF Topology object into an OpenMM topology:

AttributeError                            Traceback (most recent call last)
<ipython-input-68-10a568baf967> in <module>
     13 
     14 # create omm system with the off topology
---> 15 omm_system = ff.create_openmm_system(omm_topology)
     16 
     17 # Convert OpenMM System to a ParmEd structure.

~/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py in create_openmm_system(self, topology, **kwargs)
   1110 
   1111         # Set periodic boundary conditions if specified
-> 1112         if topology.box_vectors is not None:
   1113             system.setDefaultPeriodicBoxVectors(*topology.box_vectors)
   1114 

AttributeError: 'Topology' object has no attribute 'box_vectors'

Attaching the inputs + a Jupyter Notebook with the script. Conda installation, Ubuntu 18.04.

dlukauskis avatar Oct 21 '19 13:10 dlukauskis

Thanks for getting in touch, @dlukauskis,

It looks like the attachments didn't work -- Could you try uploading them again?

Also, ff.create_openmm_system expects an Open Force Field Topology, not an OpenMM Topology. If you already have an OFF topology, you can just substitute it into that call. Otherwise, you can produce an OFF topology from an OMM topology using the openforcefield.topology.Topology.from_openmm function. Here is one example of that:

# Create OpenFF topology with 1 ethanol and 2 benzenes.
ethanol = Molecule.from_smiles('CCO')
benzene = Molecule.from_smiles('c1ccccc1')
off_topology = Topology.from_molecules(molecules=[ethanol, benzene, benzene])

# Convert to OpenMM Topology.
omm_topology = off_topology.to_openmm()

# Convert back to OpenFF Topology.
off_topology_copy = Topology.from_openmm(omm_topology, unique_molecules=[ethanol, benzene])

https://github.com/openforcefield/openforcefield/blob/master/openforcefield/tests/test_topology.py#L446-L465

j-wags avatar Oct 21 '19 15:10 j-wags

Thanks for that, @j-wags. I've got past making parmed_structure, but now I'm getting a write issue:

AttributeError                            Traceback (most recent call last)
<ipython-input-4-16c2ff5159d2> in <module>
     20 # Export GROMACS files.
     21 parmed_structure.save('system.top', overwrite=True)
---> 22 parmed_structure.save('system.gro', overwrite=True)

~/anaconda3/envs/openff/lib/python3.7/site-packages/parmed/structure.py in save(self, fname, format, overwrite, **kwargs)
   1483                 s.write_psf(fname, **kwargs)
   1484             elif format == 'GRO':
-> 1485                 gromacs.GromacsGroFile.write(self, fname, **kwargs)
   1486             elif format == 'MOL2':
   1487                 formats.Mol2File.write(self, fname, **kwargs)

~/anaconda3/envs/openff/lib/python3.7/site-packages/parmed/gromacs/gromacsgro.py in write(struct, dest, precision, nobox, combine)
    295                                 _write_atom_line(
    296                                     original_atom, atid % 100000,
--> 297                                     resid % 100000, has_vels, dest, precision)
    298                                 new_molecule.add(original_atom)
    299                                 last_found_atom = original_atom

~/anaconda3/envs/openff/lib/python3.7/site-packages/parmed/gromacs/gromacsgro.py in _write_atom_line(atom, atid, resid, has_vels, dest, precision)
    236             dest.write('%5d%-5s%5s%5d' % (resid, atom.residue.name[:5],
    237                                           atom.name[:5], atid))
--> 238             dest.write((crdfmt % (atom.xx/10))[:varwidth])
    239             dest.write((crdfmt % (atom.xy/10))[:varwidth])
    240             dest.write((crdfmt % (atom.xz/10))[:varwidth])

AttributeError: 'Atom' object has no attribute 'xx'

Sorry about the attachment, forgot it!

dlukauskis avatar Oct 21 '19 15:10 dlukauskis

Ah, ParmEd is looking for coordinates on the atoms, but not finding them. You can provide those using the xyz keyword to parmed.openmm.load_topology:

parmed_structure = parmed.openmm.load_topology(omm_topology, omm_system, xyz=molecule.conformers[0])

Also, it looks like you're parametrizing octa-acid by subunit. However, this is being interpreted as a carboxylate, with -1 net charge:

for atom in molecule.atoms:
    print(atom.formal_charge)

0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0

@slochower had encountered this problem before, though I don't think we found an elegant way around it. He resorted to setting partial_charges on the molecule itself, and then using the charge_from_molecules kwarg to create_openmm_system to avoid having to run AM1-BCC on the host molecule during system creation.

j-wags avatar Oct 21 '19 19:10 j-wags

Host systems can be more painful than average to set up, so please keep posting about issues you might encounter, and we can rope in @slochower for pointers if we need.

davidlmobley avatar Oct 21 '19 20:10 davidlmobley

Ah, ParmEd is looking for coordinates on the atoms, but not finding them. You can provide those using the xyz keyword to parmed.openmm.load_topology:

parmed_structure = parmed.openmm.load_topology(omm_topology, omm_system, xyz=molecule.conformers[0])

Adding this did produce an output, a decent looking .top file but a badly-off .gro. Here. Is this the fault of the .sdf itself or Parmed?

Also, it looks like you're parametrizing octa-acid by subunit. However, this is being interpreted as a carboxylate, with -1 net charge:

for atom in molecule.atoms:
    print(atom.formal_charge)

0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0

This is the OctaAcid guest, not the host itself. I thought I'd start small when using OpenFF for the first time.

@slochower had encountered this problem before, though I don't think we found an elegant way around it. He resorted to setting partial_charges on the molecule itself, and then using the charge_from_molecules kwarg to create_openmm_system to avoid having to run AM1-BCC on the host molecule during system creation.

I'm entirely unable to de novo parameterise the organic hosts - Antechamber via Acpype fails to assign the partial charges on a host molecule, running out of memory after 3 hours... How did you get the partial charges in the first place? Parameterise one repeating unit in the polymer and then assume its the same throughout?

dlukauskis avatar Oct 21 '19 20:10 dlukauskis

Which guest? This should be easy, I can get likely get you a working example.

Parameterizing hosts can be tricky, and I've not done this much myself. I think @slochower may do it by subunit (at least for charging) and then stitch them together. That said, the OpenEye toolkits are usually fast enough that they can charge the whole host, at least for octa acid (we've been doing it recently), so I would expect this to work OK. Indeed, though, antechamber doesn't work well for it -- the Antechamber sqm implementation seems to choke on molecules of this size. That doesn't help you, then, if you've not got an OpenEye license. If that's what you want, I might be able to charge it for you, or we may already have a charged copy around for the reference calculations we're doing. Are you looking at SAMPL7, or doing a historical SAMPL?

davidlmobley avatar Oct 21 '19 20:10 davidlmobley

Which guest? This should be easy, I can get likely get you a working example.

In this specific case, its OA-G0 from SAMPL6 set.

Parameterizing hosts can be tricky, and I've not done this much myself. I think @slochower may do it by subunit (at least for charging) and then stitch them together. That said, the OpenEye toolkits are usually fast enough that they can charge the whole host, at least for octa acid (we've been doing it recently), so I would expect this to work OK. Indeed, though, antechamber doesn't work well for it -- the Antechamber sqm implementation seems to choke on molecules of this size. That doesn't help you, then, if you've not got an OpenEye license. If that's what you want, I might be able to charge it for you, or we may already have a charged copy around for the reference calculations we're doing. Are you looking at SAMPL7, or doing a historical SAMPL?

I've got the sdf and mol2 files for all of the guests and hosts in the test set, I was just looking for ways to use OpenFF without an OE license. It seems a lot easier to use it with OE and .mol2 files.

I'm approaching this to both teach myself how to use the OpenFF toolkit and comparing the Parsley results to GAFF2, using the small guest-host systems.

dlukauskis avatar Oct 21 '19 21:10 dlukauskis

OK, so another option for you for the hosts (since Parsley just uses AM1-BCC charges and updates valence parameters) is to use the pre-charged systems we provided in the SAMPL6 repo, so load the charges from the input molecules rather than using the openforcefield toolkit to assign charges. Otherwise you're stuck needing either (a) an OpenEye license for host charge assignment, because antechamber chokes on the whole host, or (b) needing to break it up into subunits and parameterize each individually before stitching back together.

If you wanted to make your life REALLY easy you could also utilize the fully solvated/parameterized systems provided in the repo for the reference calculations (e.g. https://github.com/samplchallenges/SAMPL6/tree/master/host_guest/Reference) and just swap in Parsley parameters for the host and guest.

I'll try to get back to you with a worked example of parameterizing just g0. You're after output to GROMACS format, it looks like...

davidlmobley avatar Oct 21 '19 21:10 davidlmobley

Apologies for the delay, I was traveling and encountered some unanticipated airline silliness. I quickly skimmed the thread, so I may have missed a few details, but here are my top level comments.

  • I don't recall any problem generating partial atomic charges on OA using antechamber, but I haven't looked at that host particularly since SAMPL5. (I'm currently sifting through old files, seeing if I can see exactly what we did for charges. Didn't find the right scripts in a quick look, I can loop back around later. For SAMPL5, I believe we used RESP charges.)
  • However, when antechamber does take a long time, try adding the -pl flag, e.g., -pl 10. This option controls the "path length" that antechamber uses to find symmetry in the molecule. In my experience, you can get away with quite good charges in a reasonable amount of time and memory by restricting the path length.
  • As @j-wags said, I usually pass charges directly using charge_from_molecules.
  • I also agree with @davidlmobley that OpenEye tools can probably generate charges for OA.
  • Generally speaking, happy to help with host-guest setup @dlukauskis although my Gromacs skills are rusty now.

slochower avatar Oct 21 '19 22:10 slochower

@dlukauskis here's an example which works (except for note at bottom) for me with the RDKit backend (which, I assume, is what you're using) for the guest:

from openforcefield.topology import *
from openforcefield.typing.engines.smirnoff import ForceField

solute = Molecule.from_file('OA-G0.sdf')
off_topology = Topology.from_molecules(molecules=[solute])

ff = ForceField("openff-1.0.0.offxml")
system = ff.create_openmm_system(off_topology)

# Create OpenMM topology
omm_topology = off_topology.to_openmm()

import parmed

# Convert OpenMM System to a ParmEd structure.
parmed_structure = parmed.openmm.load_topology(omm_topology, system, solute.conformers[0])
# Export GROMACS files.
parmed_structure.save('system.top', overwrite=True)
parmed_structure.save('system.gro', overwrite=True)

One potential issue is that it looks like this results in a GROMACS topology file without atom names or a molecule name, which is (a) perhaps not surprising because SDF files don't have atom names, and (b) probably won't run without this being populated. That said, you should be able to add any string as an atom name and molecule name and have these work.

@j-wags is there an easy way to add atom names automatically so we won't have this issue? (Also, we should make sure we automatically populate atom and molecule/system names, otherwise we'll tend to have these kinds of issues on passing through ParmEd. I thought this was something we had an open issue on, or had already dealt with, but it seems not to be the case.)

I'm attaching my input file. OA-G0.sdf.zip

davidlmobley avatar Oct 22 '19 22:10 davidlmobley

@j-wags @davidlmobley

    # FIXME: OpenForceField does not provide residue names!
    structure.residues[0].name = "RES"

    # FIXME: OpenForceField does not provide atom names!
    for index, atom in enumerate(structure.atoms):
        atom.name = f"{atom.element_name}{index}"

slochower avatar Oct 22 '19 23:10 slochower

Also, please check that the 1-4 terms are correct (e.g., SCEE and SCNB). These are not populated correctly for AMBER using ParmEd:

    # FIXME: ParmEd does not assign scee and scnb to the dihedrals
    # If this block is commented out, energies won't match between
    # ParmEd and tleap -generated PRMTOPS! I guess this is a bug in
    # ParmEd because the generated FRCMOD files are incorrect.
    if with_fixes:
        for dihedral in structure.dihedral_types:
            dihedral.scee = 1.2
            dihedral.scnb = 2.0

slochower avatar Oct 22 '19 23:10 slochower

Oooh, good catch, @slochower -- I get:

[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
1               2               no              1            0.83333

should be:

[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
     1 2      yes    0.500000 0.833333

or

[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
     1 2      no    0.500000 0.833333

I don't think this ParmEd issue is triggered in all situations -- e.g. @dfhahn has validated energies after conversion to GROMACS format and ensured they reproduce quite precisely -- but I think it's likely his cases are rescued because a portion of the system (in his case the protein) uses a different force field, or the input format is different, or both.

davidlmobley avatar Oct 22 '19 23:10 davidlmobley

cc #104

davidlmobley avatar Oct 22 '19 23:10 davidlmobley

* However, when `antechamber` does take a long time, try adding the `-pl` flag, e.g., `-pl 10`. This option controls the "path length" that `antechamber` uses to find symmetry in the molecule. In my experience, you can get away with quite good charges in a reasonable amount of time and memory by restricting the path length.

-pl 10 flag fixed the issue of parameterising the guests.

@davidlmobley the script you provided worked, but as you mentioned, the missing atom and residue names prevent it from running. I guess we can keep this issue open until that is resolved.

dlukauskis avatar Oct 23 '19 16:10 dlukauskis

@dlukauskis you should be able to do a manual fix of the topology file to resolve those issues (the actual parameters should be correct), but yes, let's leave this open for now.

davidlmobley avatar Oct 23 '19 16:10 davidlmobley

A number of things have changed in the past almost-three years:

  • The OpenFF Toolkit is capable of passing residue information onto downstream formats, provided the input data contains residues and/or they are specified via the setters
  • We are moving to use Interchange for interoperability tasks - see ForceField.create_interchange and Interchange.to_top as key methods

I'm sure we'll face more interoperability hurdles, but I'm closing this one as old. Please feel free to open an new issue on this or Interchange's tracker when something like this comes up again.

mattwthompson avatar Aug 31 '22 16:08 mattwthompson