openff-toolkit
                                
                                 openff-toolkit copied to clipboard
                                
                                    openff-toolkit copied to clipboard
                            
                            
                            
                        openFF LJ got lost when running the protein-ligand simulation with Gromacs
Describe the bug I'm trying to use https://github.com/openforcefield/openff-toolkit/blob/master/examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb to generate the Gromacs topology file and run the simulation with Gromacs. I noticed that if at the end of the notebook. I add two lines of code to generate the gromacs topology, the openFF LJ for the small molecule will get lost.
To Reproduce Run the notebook and add
complex_structure = toluene_structure + t4_structure
toluene_structure.save('ligand.top')
complex_structure.save('topol.top')
To the end of the notebook. If we check the atomtypes in the 'ligand.top', the atomtypes are correct
[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.34806469     0.36350306
C2             6  12.010780  0.00000000  A     0.33795318     0.45538912
H1             1   1.007947  0.00000000  A     0.25725815     0.06531786
H2             1   1.007947  0.00000000  A     0.26445434    0.066021356
If we check the atomtypes in 'topol.top', then the openFF atomtypes are lost.
[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
C1             6  12.010780  0.00000000  A     0.33996695      0.4577296
C2             6  12.010780  0.00000000  A     0.33996695       0.359824
H1             1   1.007947  0.00000000  A     0.10690785      0.0656888
H2             1   1.007947  0.00000000  A     0.19599772      0.0656888
N1             7  14.006720  0.00000000  A     0.32499985        0.71128
H3             1   1.007947  0.00000000  A     0.26495328      0.0656888
H4             1   1.007947  0.00000000  A      0.2471353      0.0656888
S1            16  32.065500  0.00000000  A     0.35635949          1.046
O1             8  15.999430  0.00000000  A     0.29599219        0.87864
H5             1   1.007947  0.00000000  A     0.25996425        0.06276
O2             8  15.999430  0.00000000  A     0.30664734      0.8803136
H6             1   1.007947  0.00000000  A              0              0
H7             1   1.007947  0.00000000  A     0.24214627        0.06276
H8             1   1.007947  0.00000000  A     0.25105526        0.06276
Cl1           17  35.453200  0.00000000  A     0.44010397         0.4184
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
O3             8  15.999430  0.00000000  A     0.31779646     0.65214353
H9             1   1.007947  0.00000000  A              1              0
The toluene is using the protein FF atomtype with the same name.
Thanks for the writeup @xiki-tempula - I've reproduced this and am looking into it now :-)
Ok, this appears to have the same root cause as https://github.com/ParmEd/ParmEd/issues/1197#issuecomment-929472918. The ParmEd developers have made the fix in the code but there hasn't been a release with that fix yet.
So, until a new release is made, we can add the following function to clean up the combined system:
def unique_atom_types(pmd_structure):
    omm_system = pmd_structure.createSystem(nonbondedMethod=app.NoCutoff,
                                            constraints=None,
                                            removeCMMotion=False,
                                            rigidWater=False)
    # # Copy the topology and positions
    topology = pmd_structure.topology
    positions = pmd_structure.positions
    def check_water(res):
        two_bonds = list(res.bonds())
        if len(two_bonds) == 2:
            waters = []
            for bond in two_bonds:
                elem0 = bond[0].element
                elem1 = bond[1].element
                if (elem0.atomic_number == 1 and elem1.atomic_number == 8) \
                        or (elem0.atomic_number == 8 and elem1.atomic_number == 1):
                    waters.append(True)
            if all(waters):
                return True
        else:
            return False
    atom_types_dic = {}
    count_id = 0
    for c in topology.chains():
        for r in c.residues():
            for a in r.atoms():
                if r.name + a.name in atom_types_dic:
                    a.id = atom_types_dic[r.name + a.name]
                    #print('a.id',a.id)
                else:
                    if check_water(r):
                        if a.element.atomic_number == 1:
                            a.id = 'HW'
                        else:
                            a.id = 'OW'
                        atom_types_dic[r.name + a.name] = a.id
                    elif r.name == 'LIG':
                        a.id = 'L' + str(count_id)
                    count_id += 1
    # Define a new parmed structure with the new unique atom types
    new_system_structure = parmed.openmm.load_topology(topology,
                                                       system=omm_system,
                                                       xyz=positions)
    # Re-set positions, velocities and box
    new_system_structure.positions = pmd_structure.positions
    new_system_structure.velocities = pmd_structure.velocities
    new_system_structure.box = pmd_structure.box
    return new_system_structure
Then when I run
complex_structure = toluene_structure + t4_structure
complex_structure = unique_atom_types(complex_structure)
toluene_structure.save('ligand.top', overwrite=True)
complex_structure.save('topol.top', overwrite=True)
the atom types in the combined structure look better:
[ atomtypes ]
; name    at.num    mass    charge ptype  sigma      epsilon
L0             6  12.010780  0.00000000  A     0.34806469     0.36350306
L1             6  12.010780  0.00000000  A     0.34806469     0.36350306
L2             6  12.010780  0.00000000  A     0.34806469     0.36350306
L3             6  12.010780  0.00000000  A     0.34806469     0.36350306
L4             6  12.010780  0.00000000  A     0.34806469     0.36350306
L5             6  12.010780  0.00000000  A     0.34806469     0.36350306
L6             6  12.010780  0.00000000  A     0.33795318     0.45538912
L7             1   1.007947  0.00000000  A     0.25725815     0.06531786
L8             1   1.007947  0.00000000  A     0.25725815     0.06531786
L9             1   1.007947  0.00000000  A     0.25725815     0.06531786
L10            1   1.007947  0.00000000  A     0.25725815     0.06531786
L11            1   1.007947  0.00000000  A     0.25725815     0.06531786
L12            1   1.007947  0.00000000  A     0.26445434    0.066021356
L13            1   1.007947  0.00000000  A     0.26445434    0.066021356
L14            1   1.007947  0.00000000  A     0.26445434    0.066021356
N1             7  14.006720  0.00000000  A     0.32499985        0.71128
H1             1   1.007947  0.00000000  A     0.10690785      0.0656888
C1             6  12.010780  0.00000000  A     0.33996695      0.4577296
H2             1   1.007947  0.00000000  A     0.19599772      0.0656888
H3             1   1.007947  0.00000000  A     0.26495328      0.0656888
H4             1   1.007947  0.00000000  A      0.2471353      0.0656888
S1            16  32.065500  0.00000000  A     0.35635949          1.046
C2             6  12.010780  0.00000000  A     0.33996695       0.359824
O1             8  15.999430  0.00000000  A     0.29599219        0.87864
H5             1   1.007947  0.00000000  A     0.25996425        0.06276
O2             8  15.999430  0.00000000  A     0.30664734      0.8803136
H6             1   1.007947  0.00000000  A              0              0
H7             1   1.007947  0.00000000  A     0.24214627        0.06276
H8             1   1.007947  0.00000000  A     0.25105526        0.06276
Cl1           17  35.453200  0.00000000  A     0.44010397         0.4184
Na1           11  22.989769  0.00000000  A     0.33283976     0.01158968
OW             8  15.999430  0.00000000  A     0.31779646     0.65214353
HW             1   1.007947  0.00000000  A              1              0
@j-wags Thanks. I will have a try. Could this handle two ligands? Or only one ligand with the residue name other than LIG?
There's probably a way to handle it by tinkering with this line:
                    elif r.name == 'LIG':
                        a.id = 'L' + str(count_id)
but this might run into other ParmEd issues, I'm not sure. It might be worth contacting the ParmEd developers since these can be dangerous use cases.
This might be related to #1532 depending on which versions of ParmEd reproduce this.