openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

openFF LJ got lost when running the protein-ligand simulation with Gromacs

Open xiki-tempula opened this issue 3 years ago • 4 comments

Describe the bug I'm trying to use https://github.com/openforcefield/openff-toolkit/blob/master/examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb to generate the Gromacs topology file and run the simulation with Gromacs. I noticed that if at the end of the notebook. I add two lines of code to generate the gromacs topology, the openFF LJ for the small molecule will get lost.

To Reproduce Run the notebook and add

complex_structure = toluene_structure + t4_structure
toluene_structure.save('ligand.top')
complex_structure.save('topol.top')

To the end of the notebook. If we check the atomtypes in the 'ligand.top', the atomtypes are correct

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.34806469     0.36350306
C2             6  12.010780  0.00000000  A     0.33795318     0.45538912
H1             1   1.007947  0.00000000  A     0.25725815     0.06531786
H2             1   1.007947  0.00000000  A     0.26445434    0.066021356

If we check the atomtypes in 'topol.top', then the openFF atomtypes are lost.

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.33996695      0.4577296
C2             6  12.010780  0.00000000  A     0.33996695       0.359824
H1             1   1.007947  0.00000000  A     0.10690785      0.0656888
H2             1   1.007947  0.00000000  A     0.19599772      0.0656888
N1             7  14.006720  0.00000000  A     0.32499985        0.71128
H3             1   1.007947  0.00000000  A     0.26495328      0.0656888
H4             1   1.007947  0.00000000  A      0.2471353      0.0656888
S1            16  32.065500  0.00000000  A     0.35635949          1.046
O1             8  15.999430  0.00000000  A     0.29599219        0.87864
H5             1   1.007947  0.00000000  A     0.25996425        0.06276
O2             8  15.999430  0.00000000  A     0.30664734      0.8803136
H6             1   1.007947  0.00000000  A              0              0
H7             1   1.007947  0.00000000  A     0.24214627        0.06276
H8             1   1.007947  0.00000000  A     0.25105526        0.06276
Cl1           17  35.453200  0.00000000  A     0.44010397         0.4184
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
O3             8  15.999430  0.00000000  A     0.31779646     0.65214353
H9             1   1.007947  0.00000000  A              1              0

The toluene is using the protein FF atomtype with the same name.

xiki-tempula avatar May 04 '22 10:05 xiki-tempula

Thanks for the writeup @xiki-tempula - I've reproduced this and am looking into it now :-)

j-wags avatar May 04 '22 15:05 j-wags

Ok, this appears to have the same root cause as https://github.com/ParmEd/ParmEd/issues/1197#issuecomment-929472918. The ParmEd developers have made the fix in the code but there hasn't been a release with that fix yet.

So, until a new release is made, we can add the following function to clean up the combined system:

def unique_atom_types(pmd_structure):

    omm_system = pmd_structure.createSystem(nonbondedMethod=app.NoCutoff,
                                            constraints=None,
                                            removeCMMotion=False,
                                            rigidWater=False)

    # # Copy the topology and positions
    topology = pmd_structure.topology
    positions = pmd_structure.positions

    def check_water(res):
        two_bonds = list(res.bonds())

        if len(two_bonds) == 2:

            waters = []

            for bond in two_bonds:

                elem0 = bond[0].element
                elem1 = bond[1].element

                if (elem0.atomic_number == 1 and elem1.atomic_number == 8) \
                        or (elem0.atomic_number == 8 and elem1.atomic_number == 1):
                    waters.append(True)

            if all(waters):
                return True

        else:
            return False

    atom_types_dic = {}
    count_id = 0

    for c in topology.chains():
        for r in c.residues():
            for a in r.atoms():
                if r.name + a.name in atom_types_dic:
                    a.id = atom_types_dic[r.name + a.name]
                    #print('a.id',a.id)
                else:
                    if check_water(r):
                        if a.element.atomic_number == 1:
                            a.id = 'HW'
                        else:
                            a.id = 'OW'
                        atom_types_dic[r.name + a.name] = a.id
                    elif r.name == 'LIG':
                        a.id = 'L' + str(count_id)
                    count_id += 1

    # Define a new parmed structure with the new unique atom types
    new_system_structure = parmed.openmm.load_topology(topology,
                                                       system=omm_system,
                                                       xyz=positions)
    # Re-set positions, velocities and box
    new_system_structure.positions = pmd_structure.positions
    new_system_structure.velocities = pmd_structure.velocities
    new_system_structure.box = pmd_structure.box

    return new_system_structure

Then when I run

complex_structure = toluene_structure + t4_structure
complex_structure = unique_atom_types(complex_structure)
toluene_structure.save('ligand.top', overwrite=True)
complex_structure.save('topol.top', overwrite=True)

the atom types in the combined structure look better:

[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
L0             6  12.010780  0.00000000  A     0.34806469     0.36350306
L1             6  12.010780  0.00000000  A     0.34806469     0.36350306
L2             6  12.010780  0.00000000  A     0.34806469     0.36350306
L3             6  12.010780  0.00000000  A     0.34806469     0.36350306
L4             6  12.010780  0.00000000  A     0.34806469     0.36350306
L5             6  12.010780  0.00000000  A     0.34806469     0.36350306
L6             6  12.010780  0.00000000  A     0.33795318     0.45538912
L7             1   1.007947  0.00000000  A     0.25725815     0.06531786
L8             1   1.007947  0.00000000  A     0.25725815     0.06531786
L9             1   1.007947  0.00000000  A     0.25725815     0.06531786
L10            1   1.007947  0.00000000  A     0.25725815     0.06531786
L11            1   1.007947  0.00000000  A     0.25725815     0.06531786
L12            1   1.007947  0.00000000  A     0.26445434    0.066021356
L13            1   1.007947  0.00000000  A     0.26445434    0.066021356
L14            1   1.007947  0.00000000  A     0.26445434    0.066021356
N1             7  14.006720  0.00000000  A     0.32499985        0.71128
H1             1   1.007947  0.00000000  A     0.10690785      0.0656888
C1             6  12.010780  0.00000000  A     0.33996695      0.4577296
H2             1   1.007947  0.00000000  A     0.19599772      0.0656888
H3             1   1.007947  0.00000000  A     0.26495328      0.0656888
H4             1   1.007947  0.00000000  A      0.2471353      0.0656888
S1            16  32.065500  0.00000000  A     0.35635949          1.046
C2             6  12.010780  0.00000000  A     0.33996695       0.359824
O1             8  15.999430  0.00000000  A     0.29599219        0.87864
H5             1   1.007947  0.00000000  A     0.25996425        0.06276
O2             8  15.999430  0.00000000  A     0.30664734      0.8803136
H6             1   1.007947  0.00000000  A              0              0
H7             1   1.007947  0.00000000  A     0.24214627        0.06276
H8             1   1.007947  0.00000000  A     0.25105526        0.06276
Cl1           17  35.453200  0.00000000  A     0.44010397         0.4184
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
OW             8  15.999430  0.00000000  A     0.31779646     0.65214353
HW             1   1.007947  0.00000000  A              1              0

j-wags avatar May 04 '22 16:05 j-wags

@j-wags Thanks. I will have a try. Could this handle two ligands? Or only one ligand with the residue name other than LIG?

xiki-tempula avatar May 04 '22 17:05 xiki-tempula

There's probably a way to handle it by tinkering with this line:

                    elif r.name == 'LIG':
                        a.id = 'L' + str(count_id)

but this might run into other ParmEd issues, I'm not sure. It might be worth contacting the ParmEd developers since these can be dangerous use cases.

mattwthompson avatar Jun 01 '22 21:06 mattwthompson

This might be related to #1532 depending on which versions of ParmEd reproduce this.

mattwthompson avatar Feb 02 '23 15:02 mattwthompson