openff-toolkit openFF LJ got lost when running the protein-ligand simulation with Gromacs

Describe the bug I'm trying to use https://github.com/openforcefield/openff-toolkit/blob/master/examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb to generate the Gromacs topology file and run the simulation with Gromacs. I noticed that if at the end of the notebook. I add two lines of code to generate the gromacs topology, the openFF LJ for the small molecule will get lost.

To Reproduce Run the notebook and add

complex_structure = toluene_structure + t4_structure
toluene_structure.save('ligand.top')
complex_structure.save('topol.top')

To the end of the notebook. If we check the atomtypes in the 'ligand.top', the atomtypes are correct

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.34806469     0.36350306
C2             6  12.010780  0.00000000  A     0.33795318     0.45538912
H1             1   1.007947  0.00000000  A     0.25725815     0.06531786
H2             1   1.007947  0.00000000  A     0.26445434    0.066021356

If we check the atomtypes in 'topol.top', then the openFF atomtypes are lost.

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.33996695      0.4577296
C2             6  12.010780  0.00000000  A     0.33996695       0.359824
H1             1   1.007947  0.00000000  A     0.10690785      0.0656888
H2             1   1.007947  0.00000000  A     0.19599772      0.0656888
N1             7  14.006720  0.00000000  A     0.32499985        0.71128
H3             1   1.007947  0.00000000  A     0.26495328      0.0656888
H4             1   1.007947  0.00000000  A      0.2471353      0.0656888
S1            16  32.065500  0.00000000  A     0.35635949          1.046
O1             8  15.999430  0.00000000  A     0.29599219        0.87864
H5             1   1.007947  0.00000000  A     0.25996425        0.06276
O2             8  15.999430  0.00000000  A     0.30664734      0.8803136
H6             1   1.007947  0.00000000  A              0              0
H7             1   1.007947  0.00000000  A     0.24214627        0.06276
H8             1   1.007947  0.00000000  A     0.25105526        0.06276
Cl1           17  35.453200  0.00000000  A     0.44010397         0.4184
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
O3             8  15.999430  0.00000000  A     0.31779646     0.65214353
H9             1   1.007947  0.00000000  A              1              0

The toluene is using the protein FF atomtype with the same name.

May 04 '22 10:05 xiki-tempula

Thanks for the writeup @xiki-tempula - I've reproduced this and am looking into it now :-)

May 04 '22 15:05 j-wags

Ok, this appears to have the same root cause as https://github.com/ParmEd/ParmEd/issues/1197#issuecomment-929472918. The ParmEd developers have made the fix in the code but there hasn't been a release with that fix yet.

So, until a new release is made, we can add the following function to clean up the combined system:

def unique_atom_types(pmd_structure):

    omm_system = pmd_structure.createSystem(nonbondedMethod=app.NoCutoff,
                                            constraints=None,
                                            removeCMMotion=False,
                                            rigidWater=False)

    # # Copy the topology and positions
    topology = pmd_structure.topology
    positions = pmd_structure.positions

    def check_water(res):
        two_bonds = list(res.bonds())

        if len(two_bonds) == 2:

            waters = []

            for bond in two_bonds:

                elem0 = bond[0].element
                elem1 = bond[1].element

                if (elem0.atomic_number == 1 and elem1.atomic_number == 8) \
                        or (elem0.atomic_number == 8 and elem1.atomic_number == 1):
                    waters.append(True)

            if all(waters):
                return True

        else:
            return False

    atom_types_dic = {}
    count_id = 0

    for c in topology.chains():
        for r in c.residues():
            for a in r.atoms():
                if r.name + a.name in atom_types_dic:
                    a.id = atom_types_dic[r.name + a.name]
                    #print('a.id',a.id)
                else:
                    if check_water(r):
                        if a.element.atomic_number == 1:
                            a.id = 'HW'
                        else:
                            a.id = 'OW'
                        atom_types_dic[r.name + a.name] = a.id
                    elif r.name == 'LIG':
                        a.id = 'L' + str(count_id)
                    count_id += 1

    # Define a new parmed structure with the new unique atom types
    new_system_structure = parmed.openmm.load_topology(topology,
                                                       system=omm_system,
                                                       xyz=positions)
    # Re-set positions, velocities and box
    new_system_structure.positions = pmd_structure.positions
    new_system_structure.velocities = pmd_structure.velocities
    new_system_structure.box = pmd_structure.box

    return new_system_structure

Then when I run

complex_structure = toluene_structure + t4_structure
complex_structure = unique_atom_types(complex_structure)
toluene_structure.save('ligand.top', overwrite=True)
complex_structure.save('topol.top', overwrite=True)

the atom types in the combined structure look better:

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
L0             6  12.010780  0.00000000  A     0.34806469     0.36350306
L1             6  12.010780  0.00000000  A     0.34806469     0.36350306
L2             6  12.010780  0.00000000  A     0.34806469     0.36350306
L3             6  12.010780  0.00000000  A     0.34806469     0.36350306
L4             6  12.010780  0.00000000  A     0.34806469     0.36350306
L5             6  12.010780  0.00000000  A     0.34806469     0.36350306
L6             6  12.010780  0.00000000  A     0.33795318     0.45538912
L7             1   1.007947  0.00000000  A     0.25725815     0.06531786
L8             1   1.007947  0.00000000  A     0.25725815     0.06531786
L9             1   1.007947  0.00000000  A     0.25725815     0.06531786
L10            1   1.007947  0.00000000  A     0.25725815     0.06531786
L11            1   1.007947  0.00000000  A     0.25725815     0.06531786
L12            1   1.007947  0.00000000  A     0.26445434    0.066021356
L13            1   1.007947  0.00000000  A     0.26445434    0.066021356
L14            1   1.007947  0.00000000  A     0.26445434    0.066021356
N1             7  14.006720  0.00000000  A     0.32499985        0.71128
H1             1   1.007947  0.00000000  A     0.10690785      0.0656888
C1             6  12.010780  0.00000000  A     0.33996695      0.4577296
H2             1   1.007947  0.00000000  A     0.19599772      0.0656888
H3             1   1.007947  0.00000000  A     0.26495328      0.0656888
H4             1   1.007947  0.00000000  A      0.2471353      0.0656888
S1            16  32.065500  0.00000000  A     0.35635949          1.046
C2             6  12.010780  0.00000000  A     0.33996695       0.359824
O1             8  15.999430  0.00000000  A     0.29599219        0.87864
H5             1   1.007947  0.00000000  A     0.25996425        0.06276
O2             8  15.999430  0.00000000  A     0.30664734      0.8803136
H6             1   1.007947  0.00000000  A              0              0
H7             1   1.007947  0.00000000  A     0.24214627        0.06276
H8             1   1.007947  0.00000000  A     0.25105526        0.06276
Cl1           17  35.453200  0.00000000  A     0.44010397         0.4184
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
OW             8  15.999430  0.00000000  A     0.31779646     0.65214353
HW             1   1.007947  0.00000000  A              1              0

May 04 '22 16:05 j-wags

@j-wags Thanks. I will have a try. Could this handle two ligands? Or only one ligand with the residue name other than LIG?

May 04 '22 17:05 xiki-tempula

There's probably a way to handle it by tinkering with this line:

                    elif r.name == 'LIG':
                        a.id = 'L' + str(count_id)

but this might run into other ParmEd issues, I'm not sure. It might be worth contacting the ParmEd developers since these can be dangerous use cases.

Jun 01 '22 21:06 mattwthompson

This might be related to #1532 depending on which versions of ParmEd reproduce this.

Feb 02 '23 15:02 mattwthompson

openff-toolkit openff-toolkit copied to clipboard

openFF LJ got lost when running the protein-ligand simulation with Gromacs

openff-toolkit
openff-toolkit copied to clipboard