BioSimSpace icon indicating copy to clipboard operation
BioSimSpace copied to clipboard

[FEATURE] Open Force Field (OpenFF) support

Open lohedges opened this issue 4 years ago • 25 comments

This issue thread is used to discuss support for parameterisation using force fields from the Open Force Field Initiative. Development is underway on the feature_openff branch.

An overview of the current functionality:

  • We can dynamically generate function attributes within the BioSimSpace.Parameters namesapce using the force fields found in openforcefields.get_forcefield_dirs_paths(). This means that we can expose all available force fields to the user, regardless of what version on OpenFF they have installed. Sphinx documentation is also dynamically updated when built as part of our CI pipeline.

  • As a first pass protocol, I've adapted the approach of converting OpenFF parameters to AMBER format using ParmEd as described in this example notebook. Using this approach we can now do the following:

import BioSimSpace as BSS

# Load the PDB file.
cyclohexane = BSS.IO.readMolecules("demo/molecules/cyclohexane.pdb")[0]

# Parameterise the the "openff_unconstrained-1.0.0.offxml" force field.
cyclohexane = BSS.Parameters.openff_unconstrained_1_0_0(cyclohexane).getMolecule()

Note from the linked notebook that the ParmEd conversion is lossy:

ParmEd's Structure model is inspired by AMBER, and some information in an OpenMM System are not directly translatable into a Structure. In particular, as of today (4/2/2019), long-range interaction treatment method (e.g., PME, CutoffPeriodic) and parameters (e.g., cutoff and cutoff switching distance, PME error tolerance) are known to be lost during the conversion.

However, we could always store the missing information as a property of the parameterised molecule, since we are only using the ParmEd files as an intermediate. (That said, it does remind me of a point that has been raised in several workshops, i.e. how do we set an appropriate cutoff in our protocols when we don't know the forcefield, e.g. when we've read a fully parameterised system files from disk.)

In writing the implementation I've come across several issues. I'll document these below and we can decide to open issues on the openff-toolkit GitHub where appropriate:

  • When installed via conda, the openforcefield package pulls in ambertools as a dependency. I've previously had issues with this package breaking my environment (some unspecified dependency breaking Sire) so it would be nice if this package was optional.

  • Related to the above: Most of our users will have a custom AMBER installation so it would be good openff-toolkit respected the AMBERHOME environment variable, if set. At present it relies on the ambertools suite that is installed within the conda environment. In addition, it uses the find_executable function from distutils.spawn to find antechamber. This only works when the conda environment is activated, or when the bin directory of the environment is in the PATH. This means that it wouldn't work,with our binary package if the user ran something like $HOME/biosimspace.app/bin/python script.py, unless they did PATH=$HOME/biosimspace.app/bin:$PATH $HOME/biosimspace.app/bin/python script.py. (Or, alternatively, modified their PATH in, e.g., .bashrc.)

  • ~~OpenFF seems to rely on a well formatted reference topology, which is typically a PDB file. Importantly, it requires CONECT information, which we don't currently write with our PDB parser. As a workaround I'm using rdkit to read then write our PDB file, since it adds the missing CONECT information for us. Even with this in place I've still run across plenty of issues with PDB files that look okay. I think this mostly results from duplicate atom names. If so, it should be possible to come up with functionality that generates unique names, although we might need to take care if certain names have a special meaning, e.g. CA for alpha carbon.~~

The above is resolved by going via an intermediate SDF file generated by RDKit. See #181 for details.

  • ~~In addition to an OpenMM topology (generated from the PDB file), OpenFF also requires a list of unique molecules in the topology as SMILES strings in order to generate its own topology. I'm currently using rdkit to generate the SMILES strings for me. Conveniently, multi molecule SMILES strings are delimited by a period so I can split on this, then remove duplicates, to generate the list of unique molecules. This means that the code works whether on not the original PDB file contains one or more molecules. (The example PDB used by OpenFF actually contains two molecules, but it is read as one by BioSimSpace. So much for requiring well formatted files ;-) )~~

No longer need to use SMILE when going via SDF intermediate.

  • ~~The openff-toolkit uses anterchamber to compute charges when converting to an OpenMM system. At present this is run using an os.system call with no capture of stdout or stderr. This means that the full output from antechamber is printed whenever an OpenFF parameterisation function is called from BioSimSpace. Since BioSimSpace runs parameterisation via a background thread there is also no way to safely suppress stdout. The [development version][https://github.com/openforcefield/openff-toolkit) of the toolkit has switched to using subprocess to run antechamber, but it appears that there is still no option to capture/redirect the output.~~

openff-tooltkit now uses subprocess to run antechamber and captures output.

lohedges avatar Jan 14 '21 14:01 lohedges

I've now added functionality to uniquify atom names according to the constraints of the PDB format. This removes a series of warnings that I was seeing (which couldn't be supressed). Despite this, I'm still having problems with certain molecules so it can't be the only issue. I'll try to figure out if there is a common reason why things are failing.

lohedges avatar Jan 14 '21 16:01 lohedges

Bah, openforcefield pulls in ambertools, which pulls in libgcc, which breaks my Sire development environment so that I can no longer recompile. The environment is in a completely broken state (I can't just remove that one package) so I'm unable to do anything without trashing the environment and rebuilding everything from scratch (which takes roughly 30 mins to an hour.) This is a completely unsustainable way to develop software.

lohedges avatar Jan 19 '21 10:01 lohedges

Just an update to say that I've merged things across to devel so that users can play around with the current functionality.

It looks like there's been an update to the openff-toolkit since I wrote the initial code. This now uses subprocess to run antechamber in the background and captures the stdout/stderr, hence the final issue in my first post is now resolved.

lohedges avatar Jan 22 '21 09:01 lohedges

The bundled-ness of ambertools has caused us similar packaging issues before and we're trying to find ways to smooth them over. A small step in that direction was to split out the conda recipe into openff-toolkit-base, which does not depend on RDKit or AmberTools, and openff-toolkit, which depends on openff-toolkit-base, RDKit, and AmberTools. This allows some of our users that only use the OpenEye wrapper to not need to install RDKit or AmberTools, which got them around conflicts similar to your but with Psi4, which has its own build infrastructure that was not completely compatible with AmberTools.

If there's another antechamber executable found in the path, it's possible that AmberToolsToolkitWrapper may be able to function? This solution would openff-toolkit-base, installing RDKit on top of that, and relying antechamber being found without the ambertools package pulled down from conda-forge. It seems too good to be true; maybe I'm missing something obvious and this will break something.

https://github.com/openforcefield/openff-toolkit/blob/4641ec15bce760e6e3ad0e15ead70dcf3191c361/openforcefield/utils/toolkits.py#L4358-L4376

mattwthompson avatar Jan 29 '21 14:01 mattwthompson

Thanks for the information, that's useful to know. We normally search for antechamber in ${AMBERHOME}/bin rather than assuming that directory has been added to the path, which isn't always the case in my experience. I'll try your suggested approach and see if I can get it to work with our setup.

lohedges avatar Jan 29 '21 14:01 lohedges

Using openff-toolkit-base as you suggest works as long as ${AMBERHOME}/bin is in the path. I'll switch to using this package so that we can remove the ambertools dependency.

If antechamber isn't in the path, would it also be possible to check to see if AMBERHOME has been set and search within ${AMBERHOME}/bin? (I've run into a few situations where this is the case, so we've found that relying on AMBERHOME is more reliable than the PATH. It's also easier to set.)

lohedges avatar Feb 09 '21 11:02 lohedges

That's a TODO in our code:

https://github.com/openforcefield/openff-toolkit/blob/de8a4a545351301adfe424dff0d879b2dd13bc0b/openff/toolkit/utils/toolkits.py#L4437-L4442

So we should probably get to that. Might be tricky to properly test with the way we install things, but I'll think about it.

mattwthompson avatar Feb 09 '21 15:02 mattwthompson

It would be good to support SMILES input with OpenFF. We could achieve this by allowing a SMILES string as the molecule argument for all OpenFF parameterisation functions. If a string was passed to a non-OpenFF function, then an error would be raised. This would keep things consistent with our current framework meaning that no additional arguments would be required when parameterising with different toolkits.

lohedges avatar Apr 13 '21 13:04 lohedges

It would also be good to support searching for multiple antechamber executables until we find one that is >= 20.0. This would allow a user to install AmberTools20 from conda-forge and use it alongside an existing, e.g. Amber18 installation, i.e. they would use AmberTools20 for OpenFF support, while taking advantage of an existing license for using PMEMD.

lohedges avatar Apr 13 '21 13:04 lohedges

It might get messy if we need a molecule and a SMILES string, e.g. for the coordinates and the topology, although I guess we could ignore the SMILES string argument in any functions that don't use it.

lohedges avatar Apr 13 '21 13:04 lohedges

If it's a matter of just one molecule, it's somewhat straightforward to get it through the pipeline

smiles = "CCO"
mol = Molecule.from_smiles(smiles)
mol.generate_conformers(n_conformers=1)

# Proceed as already happens
top = mol.to_topology()
forcefield.create_openmm_system(top)

the Topology does not carry positions itself, like the openmm.System but those can be attached later by accessing mol.conformers[0].

You're right that it gets messy - multi-molecule setups aren't straightforward here, using custom coordinates would require a second argument, etc. Also expect sharp edges because of the information content of SMILES vs. an SDF file, i.e. a SMILES ambiguous stereochemistry would be a problem (there's an allow_undefined_stereo flag if this needs to be quashed)

mattwthompson avatar Apr 13 '21 14:04 mattwthompson

We now support parameterisation using both SMILES strings and BioSimSpace Molecule objects (i.e. created by reading a file). This works for all of the supported parameterisation engines, with the caveat that we go via an intermediate PDB file when using AmberTools.

It is now possible to do:

import BioSimSpace as BSS

# Load a methanol PDB file.
molecule = BSS.IO.readMolecules("demos/molecules/methanol.pdb")[0]

# Create a SMILES representation of the molecule.
smiles = "CO"

# Parameterise with OpenFF using the Molecule object.
off_mol = BSS.Parameters.openff_unconstrained_1_0_0(molecule).getMolecule()

# Parameterise with OpenFF using the SMILES string.
off_smiles = BSS.Parameters.openff_unconstrained_1_0_0(smiles).getMolecule()

# Parameterise with GAFF using the Molecule object.
gaff_mol = BSS.Parameters.gaff(molecule).getMolecule()

# Parameterise with GAFF using the SMILES string.
gaff_smiles = BSS.Parameters.gaff(smiles).getMolecule()

lohedges avatar Apr 14 '21 13:04 lohedges

It seems like the documentation needs update:

企业微信截图_16340206055047

kexul avatar Oct 12 '21 06:10 kexul

Thanks, the OpenFF functions are created dynamically at run-time depending on what force fields are available from the users install. In turn, the documentation is also auto generated. It looks like the template function has the wrong docstring, I'll take a look later.

lohedges avatar Oct 12 '21 07:10 lohedges

I wonder if BSS currently supports openFF? I tried to use BSS to Parameterise OpenFF and it failed so I was wondering if it has been implemented.

xiki-tempula avatar Mar 29 '22 10:03 xiki-tempula

Hello there,

Thanks for the question. Yes, we have supported OpenFF for around a year now. However, we currently only support the unconstrained versions of the force fields.

When you say it failed, what exactly are you referring to? Was the functionality simply unavailable, or did you get an error when trying to parametrise? If you could let us know your operating system and installation method (conda or source) along BioSimSpace version and error message that you are seeing, then we'd be happy to debug. You can find the BioSimSpace version as follows:

import BioSimSpace
print(BSS.__version__)

lohedges avatar Mar 29 '22 10:03 lohedges

@lohedges Hi, so I set up the env with

conda create -n biosimspace -c conda-forge -c michellab biosimspace
conda install -c conda-forge ambertools

Then I feed in a pdb file

    system = BSS.IO.readMolecules('lig.pdb')
    molecule = system[0]

    # Create a background process to parameterise the molecule.
    process = BSS.Parameters.openff_unconstrained_2_0_0(molecule)

    # Get the parameterised molecule from the process.
    molecule = process.getMolecule()

    # Get the intermediate files from the background process.
    process.getOutput(filename="param_eror.zip")

which gives me

Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
Traceback (most recent call last):
  File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-fd0859cb3195>", line 1, in <cell line: 1>
    runfile('/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/full_xtb/run.py', wdir='/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/full_xtb')
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/full_xtb/run.py", line 20, in <module>
    molecule = process.getMolecule()
  File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/BioSimSpace/Parameters/_process.py", line 227, in getMolecule
    raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error)) from None
BioSimSpace._Exceptions._exceptions.ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!'

The lig.pdb is

CRYST1   53.880   55.610   57.240  90.00  90.00  90.00 P 1            
HETATM    1  C0  lig     1      -1.960  21.550 -27.330  0.00  0.00           C  
HETATM    2  C1  lig     1      -1.290  22.410 -28.200  0.00  0.00           C  
HETATM    3  C2  lig     1      -1.960  22.950 -29.290  0.00  0.00           C  
HETATM    4  C3  lig     1      -3.300  22.620 -29.540  0.00  0.00           C  
HETATM    5  C4  lig     1      -3.970  21.760 -28.650  0.00  0.00           C  
HETATM    6  C5  lig     1      -3.290  21.230 -27.560  0.00  0.00           C  
HETATM    7 Cl6  lig     1      -5.640  21.350 -28.920  0.00  0.00          CL  
HETATM    8  C7  lig     1      -4.040  23.150 -30.710  0.00  0.00           C  
HETATM    9  O8  lig     1      -4.230  22.420 -31.670  0.00  0.00           O  
HETATM   10  N9  lig     1      -4.510  24.410 -30.650  0.00  0.00           N  
HETATM   11  C10 lig     1      -5.290  25.110 -31.580  0.00  0.00           C  
HETATM   12  C11 lig     1      -5.960  24.550 -32.680  0.00  0.00           C  
HETATM   13  C12 lig     1      -6.780  25.360 -33.460  0.00  0.00           C  
HETATM   14  N13 lig     1      -6.920  26.650 -33.180  0.00  0.00           N  
HETATM   15  C14 lig     1      -6.300  27.230 -32.160  0.00  0.00           C  
HETATM   16  C15 lig     1      -5.510  26.460 -31.310  0.00  0.00           C  
HETATM   17  N16 lig     1      -6.540  28.580 -31.880  0.00  0.00           N  
HETATM   18  C17 lig     1      -5.710  29.450 -31.230  0.00  0.00           C  
HETATM   19  O18 lig     1      -4.620  29.120 -30.810  0.00  0.00           O  
HETATM   20  C19 lig     1      -6.230  30.850 -31.030  0.00  0.00           C  
HETATM   21 Cl20 lig     1      -1.110  24.050 -30.330  0.00  0.00          CL  
HETATM   22  H21 lig     1      -6.760  31.250 -31.890  0.00  0.00           H  
HETATM   23  H22 lig     1      -1.440  21.130 -26.480  0.00  0.00           H  
HETATM   24  H23 lig     1      -0.260  22.660 -28.020  0.00  0.00           H  
HETATM   25  H24 lig     1      -3.810  20.560 -26.890  0.00  0.00           H  
HETATM   26  H25 lig     1      -4.320  24.980 -29.840  0.00  0.00           H  
HETATM   27  H26 lig     1      -5.840  23.500 -32.910  0.00  0.00           H  
HETATM   28  H27 lig     1      -7.310  24.940 -34.300  0.00  0.00           H  
HETATM   29  H28 lig     1      -5.060  26.910 -30.430  0.00  0.00           H  
HETATM   30  H29 lig     1      -7.400  29.010 -32.180  0.00  0.00           H  
HETATM   31  H30 lig     1      -6.890  30.870 -30.160  0.00  0.00           H  
HETATM   32  H31 lig     1      -5.460  32.770 -30.540  0.00  0.00           H  
HETATM   33  O32 lig     1      -5.090  31.810 -30.670  0.00  0.00           O  
END        

The print(BSS.version) is 2022.2.0 and OS is MacOS 12.3 with M1 chip.

xiki-tempula avatar Mar 29 '22 11:03 xiki-tempula

Thanks for the info. After making BioSimSpace output verbose error messages I see the following:

# Do this at the start of your script if you want more error output. A lot of
# the time this is very noisy and unhelpful, so it is disabled by default.
BSS.setVerbose(True)
ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!: UndefinedStereochemistryError('Unable to make OFFMol from RDMol: Unable to make OFFMol from RDMol: RDMol has unspecified stereochemistry. RDMol name: Undefined chiral centers are:\n - Atom C (index 1)\n - Atom C (index 2)\n - Atom C (index 4)\n - Atom C (index 5)\n - Atom C (index 7)\n - Atom C (index 11)\n - Atom C (index 12)\n - Atom C (index 15)\n - Atom C (index 17)\n')'

It looks like the RDKit molecule created from your PDB can't be converted to an OpenFF molecule as the stereochemistry is undefined. This is an issue with using PDB and, unfortunately, we don't currently support SDF format. You could try using Mol2 format instead, or try passing a SMILES representation of the molecule.

Cheers.

lohedges avatar Mar 29 '22 11:03 lohedges

OpenFF does has an option allow_undefined_stereo=True when reading molecules, but this also doesn't work in this case. (It fails at a later stage though.) I experimented with this option when first adding OpenFF support and found that it caused more problems than it solved, i.e. it was easier to have the user specify the correct stereochemistry from the outset.

lohedges avatar Mar 29 '22 11:03 lohedges

Thanks. I will try to use antechamber to generate the input file.

xiki-tempula avatar Mar 29 '22 11:03 xiki-tempula

@lohedges Thank you. I have used antechamber -i lig.pdb -fi pdb -o ligand.sybyl.mol2 -fo mol2 -c bcc -s 2 -at sybyl to generate the mol2 file as input, but it seems it would still give me error.

Traceback (most recent call last):
  File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-5b69c677995e>", line 1, in <cell line: 1>
    runfile('/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/sage/run.py', wdir='/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/sage')
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/sage/run.py", line 17, in <module>
    molecule = process.getMolecule()
  File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/BioSimSpace/Parameters/_process.py", line 227, in getMolecule
    raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error)) from None
BioSimSpace._Exceptions._exceptions.ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!'

The mol2 file is

@<TRIPOS>MOLECULE
lig
   32    33     1     0     0
SMALL
bcc


@<TRIPOS>ATOM
      1 C0          -1.9600    21.5500   -27.3300 C.ar       1 lig      -0.096000
      2 C1          -1.2900    22.4100   -28.2000 C.ar       1 lig      -0.129000
      3 C2          -1.9600    22.9500   -29.2900 C.ar       1 lig       0.052400
      4 C3          -3.3000    22.6200   -29.5400 C.ar       1 lig      -0.143600
      5 C4          -3.9700    21.7600   -28.6500 C.ar       1 lig       0.052400
      6 C5          -3.2900    21.2300   -27.5600 C.ar       1 lig      -0.129000
      7 CL6         -5.6400    21.3500   -28.9200 Cl         1 lig      -0.060400
      8 C7          -4.0400    23.1500   -30.7100 C.2        1 lig       0.691700
      9 O8          -4.2300    22.4200   -31.6700 O.2        1 lig      -0.550100
     10 N9          -4.5100    24.4100   -30.6500 N.am       1 lig      -0.460100
     11 C10         -5.2900    25.1100   -31.5800 C.ar       1 lig       0.127600
     12 C11         -5.9600    24.5500   -32.6800 C.ar       1 lig      -0.304300
     13 C12         -6.7800    25.3600   -33.4600 C.ar       1 lig       0.445200
     14 N13         -6.9200    26.6500   -33.1800 N.ar       1 lig      -0.738000
     15 C14         -6.3000    27.2300   -32.1600 C.ar       1 lig       0.551200
     16 C15         -5.5100    26.4600   -31.3100 C.ar       1 lig      -0.323300
     17 N16         -6.5400    28.5800   -31.8800 N.am       1 lig      -0.553400
     18 C17         -5.7100    29.4500   -31.2300 C.2        1 lig       0.671100
     19 O18         -4.6200    29.1200   -30.8100 O.2        1 lig      -0.593100
     20 C19         -6.2300    30.8500   -31.0300 C.3        1 lig      -0.178100
     21 CL20        -1.1100    24.0500   -30.3300 Cl         1 lig      -0.060400
     22 H21         -7.3100    30.9400   -31.1600 H          1 lig       0.071033
     23 H22         -1.4400    21.1300   -26.4800 H          1 lig       0.148000
     24 H23         -0.2600    22.6600   -28.0200 H          1 lig       0.157000
     25 H24         -3.8100    20.5600   -26.8900 H          1 lig       0.157000
     26 H25         -4.3200    24.9800   -29.8400 H          1 lig       0.330500
     27 H26         -5.8400    23.5000   -32.9100 H          1 lig       0.183000
     28 H27         -7.3100    24.9400   -34.3000 H          1 lig       0.025100
     29 H28         -5.0600    26.9100   -30.4300 H          1 lig       0.180000
     30 H29         -7.4000    29.0100   -32.1800 H          1 lig       0.331500
     31 H30         -5.7200    31.5200   -31.7200 H          1 lig       0.071033
     32 H31         -5.9600    31.1900   -30.0400 H          1 lig       0.071033
@<TRIPOS>BOND
     1     1     2 ar  
     2     1     6 ar  
     3     1    23 1   
     4     2     3 ar  
     5     2    24 1   
     6     3     4 ar  
     7     3    21 1   
     8     4     5 ar  
     9     4     8 1   
    10     5     6 ar  
    11     5     7 1   
    12     6    25 1   
    13     8     9 2   
    14     8    10 am  
    15    10    11 1   
    16    10    26 1   
    17    11    12 ar  
    18    11    16 ar  
    19    12    13 ar  
    20    12    27 1   
    21    13    14 ar  
    22    13    28 1   
    23    14    15 ar  
    24    15    16 ar  
    25    15    17 1   
    26    16    29 1   
    27    17    18 am  
    28    17    30 1   
    29    18    19 2   
    30    18    20 1   
    31    20    22 1   
    32    20    31 1   
    33    20    32 1   
@<TRIPOS>SUBSTRUCTURE
     1 lig         1 TEMP              0 ****  ****    0 ROOT

I wonder if you mind help me, please? Thank you.

xiki-tempula avatar Mar 30 '22 10:03 xiki-tempula

Hi again,

I just realised that our RDKit to OpenFF conversion always goes via PDB format, so we will lose the Mol2 stereochemistry regardless. (Your error above is the same if you turn on verbose messages with BSS.setVerbose(True).)

With this in mind, I've modified my local version of BioSimSpace so that it will go via Mol2 format if that's what the original molecule was loaded from. However, even after doing this it still errors, but this time with:

Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
/home/lester/sire.app/bin/wrapped_progs/antechamber: Fatal Error!
Unable to find sqm charges in file (sqm.out).
Verify the filename and the file contents.
/home/lester/sire.app/lib/python3.7/site-packages/openff/toolkit/typing/engines/smirnoff/parameters.py:4118: Warning: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name 'lig' and SMILES '[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[Cl-].[Cl-].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[N-3].[N-3].[N-3].[O-2].[O-2]', 'partial_charge_method': 'am1bcc', 'use_conformers': None, 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.FrozenMolecule'>}"
Available toolkits are: [ToolkitWrapper around The RDKit version 2021.09.5, ToolkitWrapper around AmberTools version 21.0, ToolkitWrapper around Built-in Toolkit version None]
 ToolkitWrapper around The RDKit version 2021.09.5 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1bcc' is not available from RDKitToolkitWrapper. Available charge methods are ['mmff94']
 ToolkitWrapper around AmberTools version 21.0 <class 'subprocess.CalledProcessError'> : Command '['antechamber', '-i', 'molecule.sdf', '-fi', 'sdf', '-o', 'charged.mol2', '-fo', 'mol2', '-pf', 'yes', '-dr', 'n', '-c', 'bcc', '-nc', '-4.0']' returned non-zero exit status 1.
 ToolkitWrapper around Built-in Toolkit version None <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : Partial charge method "am1bcc"" is not supported by the Built-in toolkit. Available charge methods are ['zeros', 'formal_charge']

  warnings.warn(str(e), Warning)
---------------------------------------------------------------------------
ParameterisationError                     Traceback (most recent call last)
<ipython-input-5-5586526ef716> in <module>
----> 1 lig = BSS.Parameters.openff_unconstrained_2_0_0(lig).getMolecule()

~/Code/BioSimSpace/python/BioSimSpace/Parameters/_process.py in getMolecule(self)
    223         if self._is_error:
    224             if _isVerbose():
--> 225                 raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error))
    226             else:
    227                 raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error)) from None

ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenMM System!: UnassignedMoleculeChargeException('The following molecules did not have charges assigned by any ParameterHandler in the ForceField:\n[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[Cl-].[Cl-].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[N-3].[N-3].[N-3].[O-2].[O-2]\n')'

I'll see if I can try some other formats.

lohedges avatar Mar 30 '22 11:03 lohedges

This is how I usually create the openFF format from mol2 file if it could be helpful.

try:
    from openmm.app import PDBFile
except ImportError:
    from simtk.openmm.app import PDBFile

import parmed
import MDAnalysis as mda

from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.utils import get_data_file_path
    mol2_parm = mda.Universe('ligand.sybyl.mol2')
    mol2_parm.atoms.write('ligand.pdb')
    pdbfile = PDBFile('ligand.pdb')
    m = Molecule.from_file('ligand.sybyl.mol2',
                           allow_undefined_stereo=True)
    omm_topology = pdbfile.topology
    # Create the Open Forcefield topology
    off_topology = Topology.from_openmm(omm_topology, unique_molecules=[m])

    # Load the "Parsley" force field
    forcefield = ForceField('openff_unconstrained-2.0.0.offxml')
    omm_system = forcefield.create_openmm_system(off_topology)

    # Convert to a ParmEd structure
    parmed_structure = parmed.openmm.load_topology(omm_topology, omm_system,
                                                   pdbfile.positions)

xiki-tempula avatar Mar 30 '22 11:03 xiki-tempula

Does the molecule loaded here

    system = BSS.IO.readMolecules('lig.pdb')

have bond order and stereochemistry information? I assume no, since it's a PDB file, but I don't know what else is added into this object model.

I just realised that our RDKit to OpenFF conversion always goes via PDB format

In general, passing a well-formed (...ish) RDKit molecule to OpenFF is best done by Molecule.from_rdkit - alternatives being SDF files (from_file) or SMILES (from_smiles or from_mapped_smiles depending on the flavor). Probably with allow_undefined_stereo=True used in a few places.

mattwthompson avatar Mar 30 '22 11:03 mattwthompson

Thanks, both.

...have bond order and stereochemistry information? I assume no, since it's a PDB file, but I don't know what else is added into this object model.

No, and this is the general problem when using BioSimSpace to try to parameterise with OpenFF. Since we don't have an SDF reader, we assume that ligands are loaded from PDB or Mol2 format. This works fine when using AmberTools or pdb2gmx, but isn't robust when using OpenFF. As an alternative we also support passing a SMILES string directly. For this ligand, the following works:

# The following was generated by RDKit.
smiles = "CC(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1"

lig = BSS.Parameters.openff_unconstrained_2_0_0(smiles).getMolecule()

Perhaps I should have a fallback that converts internalyl to SMILES if the parameterisation fails.

Probably with allow_undefined_stereo=True used in a few places.

I've previously tried this option with limited success. Ideally we'd like the user to specify everything upfront to avoid any nasty surprises downstream. I guess I could add an option to handle this, though, or warn the user if it was applied when the initial conversion fails.

Cheers.

lohedges avatar Mar 30 '22 11:03 lohedges