BioSimSpace icon indicating copy to clipboard operation
BioSimSpace copied to clipboard

Crystal waters

Open lohedges opened this issue 5 years ago • 18 comments

As anticipated, it turns out that the presence of crystal waters are extremely important for obtaining the correct binding mode of certain ligands. BioSimSpace needs to be modified so that they can be handled during the setup stage:

  • Crystal waters typically appear as oxygen atoms. We need to add hydrogens (and possibly dummies) in the appropriate geometry for the water model of choice. (Not sure if tLEaP or gmx solvate can do this for you.)

  • The crystal waters will need to be present during the solvation step so that the additional waters are added around them.

lohedges avatar Oct 16 '18 06:10 lohedges

GROMACS will happily solvate around existing water molecules in a system, although it wont convert these molecules to match the topology or naming scheme of the chosen water model. This means that the residue name of any crystal water molecules will need to be changed to SOL before running gmx solvate, as well as changing the name of any hydrogen atoms to HW1/HW2 and oxygens to OW.

This is easy enough, although I'm not sure what to do when there are missing sites, e.g. if there are two hydrogens and one oxygen but we want to solvate with TIP4P. Perhaps there are standard templates that we can use (assuming we have the coordinates of the crystal oxygen atom).

lohedges avatar Oct 23 '18 09:10 lohedges

I must admit I have never used TIP4P water, so I don't think I have any useful recommendation here.

ppxasjsm avatar Oct 23 '18 10:10 ppxasjsm

as the water models are rigid it is trivial to convert TIP3P <--> TIP4P


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Tue, Oct 23, 2018 at 11:38 AM ppxasjsm <[email protected]mailto:[email protected]> wrote:

I must admit I have never used TIP4P water, so I don't think I have any useful recommendation here.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/32#issuecomment-432194214, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLd5FSlzKMXDCNV7828EVeyvL76IzDVks5unvGSgaJpZM4XdrOt.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

jmichel80 avatar Oct 23 '18 10:10 jmichel80

I have code to detect water molecules in a Sire system. However, what would we expect for crystal waters loaded from a PDB file. Do these always appears as two hydrogens and an oxygen, or just the oxygen atom. If the latter, then we'll need some way of checking the lone oxygens, then converting them to the appropriate water model.

I've updated BioSimSpace.Solvent so that it can solvate systems that contain existing three-point water molecules, which is the case for the protein plus crystal water examples I have so far.

lohedges avatar Oct 23 '18 11:10 lohedges

Hi Lester,

Frequently we will only have an oxygen in the input, and have to guess the hydrogen atoms positions. leap just put them in a arbitrary orientation. MD equilibration may (or not) sort this out. There are a variety of tools available to do a smart job with water placement. In the short term, we can assume it is ok to proceed if all water molecules have at least 2 hydrogen + 1 oxygen, and raise an issue if that is not the case. We can discuss later whether we should attempt to deal with this situation automatically.

Julien


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Tue, Oct 23, 2018 at 12:20 PM Lester Hedges <[email protected]mailto:[email protected]> wrote:

I have code to detect water molecules in a Sire system. However, what would we expect for crystal waters loaded from a PDB file. Do these always appears as two hydrogens and an oxygen, or just the oxygen atom. If the latter, then we'll need some way of checking the lone oxygens, then converting them to the appropriate water model.

I've updated BioSimSpace.Solvent so that it can solvate systems that contain existing three-point water molecules, which is the case for the protein plus crystal water examples I have so far.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/32#issuecomment-432206228, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLd5C--xoJeOY-EGTnq9YjB79YRactBks5unvtpgaJpZM4XdrOt.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

jmichel80 avatar Oct 23 '18 11:10 jmichel80

Agree - we can add a BioSimSpace node at another time that wraps up different algorithms for adding hydrogens to crystal waters (which do typically appear as lone oxygen atoms in pdbs)

chryswoods avatar Oct 23 '18 14:10 chryswoods

I've updated BioSimSpace.Solvent so that it can solvate systems that contain existing three-point water molecules, which is the case for the protein plus crystal water examples I have so far.

Hi @lohedges , has it been fixed yet? Recently I'm solvating protein with BSS.Solvent.tip3p, it still failed for protein which had water in it.

Here is the input file I've used: pro.zip

kexul avatar Jul 01 '21 12:07 kexul

I'm afraid we haven't fully added this feature yet. We do support situations where there are existing water molecules in the system that are complete (not just a lone oxygen) and appear as separate molecules, i.e. not embedded in the protein topology. In this case we try to detect the water model and convert to the appropriate GROMACS format prior to solvating. It should warn you if the topology is incompatible with the one that you choose to solvate with.

Just checking your input files now.

lohedges avatar Jul 01 '21 12:07 lohedges

I managed to solvate it just fine:

import BioSimSpace as BSS

# Load the system.
system = BSS.IO.readMolecules("pro.*")

# Print the coordinates of the first water molecule.
system[1].coordinates()
[(-66.3700 A, -3.2200 A, -43.4600 A),
 (-65.4100 A, -3.2200 A, -43.4600 A),
 (-66.6100 A, -2.3000 A, -43.4600 A)]

# Now solvate the protein + water system.
solvated = BSS.Solvent.tip3p(molecule=system, box=3*[100*BSS.Units.Length.angstrom])

# Print the coordinates of the first water molecule.
solvated[1].coordinates()
[(-66.3700 A, -3.2200 A, -43.4600 A),
 (-65.4100 A, -3.2200 A, -43.4600 A),
 (-66.6100 A, -2.3000 A, -43.4600 A)]

As you can see, the coordinates of the water molecule are the same, i.e. we have solvated around the protein and original waters.

BSS.__version__
'2020.1.0+485.g0c47c04'

lohedges avatar Jul 01 '21 13:07 lohedges

Just to note that all of the water molecules in the original protein system seem to have identical z-coordinates. (Each molecule has a different z-coord, but within a molecule they are the same.) It looks like they are oriented in the x-y plane.

system[1].coordinates()
[(-66.3700 A, -3.2200 A, -43.4600 A),
 (-65.4100 A, -3.2200 A, -43.4600 A),
 (-66.6100 A, -2.3000 A, -43.4600 A)]

system[2].coordinates()
[(-63.1700 A, -0.7800 A, -47.9000 A),
 (-62.2200 A, -0.7800 A, -47.9000 A),
 (-63.4100 A, 0.1500 A, -47.9000 A)]

system[3].coordinates()
[(-57.4700 A, 13.7900 A, -39.0800 A),
 (-56.5200 A, 13.7900 A, -39.0800 A),
 (-57.7100 A, 14.7200 A, -39.0800 A)]

lohedges avatar Jul 01 '21 13:07 lohedges

Thanks for the quick reply! Sorry for my incomplete information, here is the code I've used:

import BioSimSpace as _BSS
lig1 = ['CHEMBL92812.prmtop', 'CHEMBL92812.inpcrd']
lig2 = ['CHEMBL93461.prmtop', 'CHEMBL93461.inpcrd']
lig1 = _BSS.IO.readMolecules(lig1)[0]
lig2 = _BSS.IO.readMolecules(lig2)[0]
pert = _BSS.Align.merge(lig1, lig2)
pert = pert.toSystem()
pro = _BSS.IO.readMolecules('pro.*')
sys = pro + pert 
_BSS.Solvent.tip3p(sys, box=3*[8*_BSS.Units.Length.nanometer], work_dir='test_case')

Here is the input files: pert.zip

The protein can be solvated alone by BSS.Solvent.tip3p, but it failed when combined with a ligand. Here is the content of genion.err:

Program:     gmx genion, version 2021.1
Source file: src/gromacs/gmxpreprocess/genion.cpp (line 594)

Fatal error:
The solvent group SOL is not continuous: index[375]=5586, index[376]=5623

kexul avatar Jul 01 '21 13:07 kexul

Ah, okay, that's good to know. In this case you can just swap the order of pro and pert when creating the complex, i.e.:

sys = pert + pro

That way all of the water molecules will be at the end of the topology file, so it will be continuous when gmx solvate adds more to the end. I can update the code to do this internally, but it might surprise the user if what's returned is suddenly re-ordered. (I can at least warn if the water molecules aren't at the end of the file, or indexed continually.)

Note that your requested box dimension is quite small. In order to neutralise the system gmx genion needs a sufficiently large box to perform it's calculation. A good rule of thumb is to work out an appropriately sized box from the axis-aligned bounding box of the protein or complex, e.g..:

# Find the dimensions of the protein.
box_min, box_max = protein.getAxisAlignedBoundingBox()

# Work out the box size from the difference in the coordinates.
box_size = [y - x for x, y in zip(box_min, box_max)]

# how much to pad each side of the protein (nonbonded cutoff = 10 A).
padding = 15 * BSS.Units.Length.angstrom

box_length = max(box_size) + 2*padding

cmplx = protein + ligand

solvated = BSS.Solvent.tip3p(molecule = cmplx.toSystem(), box=3*[box_length])

lohedges avatar Jul 01 '21 13:07 lohedges

Actually, it's still giving the non-continuous error. Will try to figure it out when I get a moment.

lohedges avatar Jul 01 '21 13:07 lohedges

It looks like the molecule ordering isn't preserved when creating the sys object, which is something that we should have fixed lately. Will try to debug.

lohedges avatar Jul 01 '21 13:07 lohedges

Thanks for the code which automatically setup box size! 😀

For your information, I could solvate the system successfully with protein which did not have water. pro_no_water.zip

kexul avatar Jul 01 '21 13:07 kexul

If you do the following then you can create a complex with the ligand first, then protein, then waters:

# You don't need to do the toSystem bit, since you've already done it, but I'm just
# highlighting that we first create a system, then add more molecules to it.
sys = pert.toSystem() + protein.getMolecules()

The waters will now be contiguous, so you should be able to solvate with an appropriately sized box. (Just trying to work out why the ordering is inconsistent otherwise.)

lohedges avatar Jul 01 '21 13:07 lohedges

sys = pert.toSystem() + protein.getMolecules()

Thanks, it worked perfectly!

kexul avatar Jul 01 '21 14:07 kexul

No problem. I've just pushed a couple of commits that fix these issues. Molecule ordering is now preserved when adding two System objects and existing water molecules are moved to the end of the topology prior to running gmx solvate. Your original script now works fine for me.

(Your box size was actually fine. Somehow I thought it was 8 angstroms, not nanometers. At least you learned a useful box size setup trick.)

lohedges avatar Jul 01 '21 14:07 lohedges