BioSimSpace
BioSimSpace copied to clipboard
[FEATURE] Open Force Field (OpenFF) support
This issue thread is used to discuss support for parameterisation using force fields from the Open Force Field Initiative. Development is underway on the feature_openff branch.
An overview of the current functionality:
-
We can dynamically generate function attributes within the
BioSimSpace.Parametersnamesapce using the force fields found inopenforcefields.get_forcefield_dirs_paths(). This means that we can expose all available force fields to the user, regardless of what version on OpenFF they have installed. Sphinx documentation is also dynamically updated when built as part of our CI pipeline. -
As a first pass protocol, I've adapted the approach of converting OpenFF parameters to AMBER format using ParmEd as described in this example notebook. Using this approach we can now do the following:
import BioSimSpace as BSS
# Load the PDB file.
cyclohexane = BSS.IO.readMolecules("demo/molecules/cyclohexane.pdb")[0]
# Parameterise the the "openff_unconstrained-1.0.0.offxml" force field.
cyclohexane = BSS.Parameters.openff_unconstrained_1_0_0(cyclohexane).getMolecule()
Note from the linked notebook that the ParmEd conversion is lossy:
ParmEd's Structure model is inspired by AMBER, and some information in an OpenMM System are not directly translatable into a Structure. In particular, as of today (4/2/2019), long-range interaction treatment method (e.g., PME, CutoffPeriodic) and parameters (e.g., cutoff and cutoff switching distance, PME error tolerance) are known to be lost during the conversion.
However, we could always store the missing information as a property of the parameterised molecule, since we are only using the ParmEd files as an intermediate. (That said, it does remind me of a point that has been raised in several workshops, i.e. how do we set an appropriate cutoff in our protocols when we don't know the forcefield, e.g. when we've read a fully parameterised system files from disk.)
In writing the implementation I've come across several issues. I'll document these below and we can decide to open issues on the openff-toolkit GitHub where appropriate:
-
When installed via
conda, theopenforcefieldpackage pulls inambertoolsas a dependency. I've previously had issues with this package breaking my environment (some unspecified dependency breaking Sire) so it would be nice if this package was optional. -
Related to the above: Most of our users will have a custom AMBER installation so it would be good
openff-toolkitrespected theAMBERHOMEenvironment variable, if set. At present it relies on theambertoolssuite that is installed within thecondaenvironment. In addition, it uses thefind_executablefunction fromdistutils.spawnto findantechamber. This only works when thecondaenvironment is activated, or when thebindirectory of the environment is in thePATH. This means that it wouldn't work,with our binary package if the user ran something like$HOME/biosimspace.app/bin/python script.py, unless they didPATH=$HOME/biosimspace.app/bin:$PATH $HOME/biosimspace.app/bin/python script.py. (Or, alternatively, modified theirPATHin, e.g.,.bashrc.) -
~~OpenFF seems to rely on a well formatted reference topology, which is typically a PDB file. Importantly, it requires
CONECTinformation, which we don't currently write with our PDB parser. As a workaround I'm usingrdkitto read then write our PDB file, since it adds the missingCONECTinformation for us. Even with this in place I've still run across plenty of issues with PDB files that look okay. I think this mostly results from duplicate atom names. If so, it should be possible to come up with functionality that generates unique names, although we might need to take care if certain names have a special meaning, e.g.CAfor alpha carbon.~~
The above is resolved by going via an intermediate SDF file generated by RDKit. See #181 for details.
- ~~In addition to an OpenMM topology (generated from the PDB file), OpenFF also requires a list of unique molecules in the topology as SMILES strings in order to generate its own topology. I'm currently using
rdkitto generate the SMILES strings for me. Conveniently, multi molecule SMILES strings are delimited by a period so I can split on this, then remove duplicates, to generate the list of unique molecules. This means that the code works whether on not the original PDB file contains one or more molecules. (The example PDB used by OpenFF actually contains two molecules, but it is read as one by BioSimSpace. So much for requiring well formatted files ;-) )~~
No longer need to use SMILE when going via SDF intermediate.
- ~~The
openff-toolkitusesanterchamberto compute charges when converting to an OpenMM system. At present this is run using anos.systemcall with no capture ofstdoutorstderr. This means that the full output fromantechamberis printed whenever an OpenFF parameterisation function is called from BioSimSpace. Since BioSimSpace runs parameterisation via a background thread there is also no way to safely suppressstdout. The [development version][https://github.com/openforcefield/openff-toolkit) of the toolkit has switched to usingsubprocessto runantechamber, but it appears that there is still no option to capture/redirect the output.~~
openff-tooltkit now uses subprocess to run antechamber and captures output.
I've now added functionality to uniquify atom names according to the constraints of the PDB format. This removes a series of warnings that I was seeing (which couldn't be supressed). Despite this, I'm still having problems with certain molecules so it can't be the only issue. I'll try to figure out if there is a common reason why things are failing.
Bah, openforcefield pulls in ambertools, which pulls in libgcc, which breaks my Sire development environment so that I can no longer recompile. The environment is in a completely broken state (I can't just remove that one package) so I'm unable to do anything without trashing the environment and rebuilding everything from scratch (which takes roughly 30 mins to an hour.) This is a completely unsustainable way to develop software.
Just an update to say that I've merged things across to devel so that users can play around with the current functionality.
It looks like there's been an update to the openff-toolkit since I wrote the initial code. This now uses subprocess to run antechamber in the background and captures the stdout/stderr, hence the final issue in my first post is now resolved.
The bundled-ness of ambertools has caused us similar packaging issues before and we're trying to find ways to smooth them over. A small step in that direction was to split out the conda recipe into openff-toolkit-base, which does not depend on RDKit or AmberTools, and openff-toolkit, which depends on openff-toolkit-base, RDKit, and AmberTools. This allows some of our users that only use the OpenEye wrapper to not need to install RDKit or AmberTools, which got them around conflicts similar to your but with Psi4, which has its own build infrastructure that was not completely compatible with AmberTools.
If there's another antechamber executable found in the path, it's possible that AmberToolsToolkitWrapper may be able to function? This solution would openff-toolkit-base, installing RDKit on top of that, and relying antechamber being found without the ambertools package pulled down from conda-forge. It seems too good to be true; maybe I'm missing something obvious and this will break something.
https://github.com/openforcefield/openff-toolkit/blob/4641ec15bce760e6e3ad0e15ead70dcf3191c361/openforcefield/utils/toolkits.py#L4358-L4376
Thanks for the information, that's useful to know. We normally search for antechamber in ${AMBERHOME}/bin rather than assuming that directory has been added to the path, which isn't always the case in my experience. I'll try your suggested approach and see if I can get it to work with our setup.
Using openff-toolkit-base as you suggest works as long as ${AMBERHOME}/bin is in the path. I'll switch to using this package so that we can remove the ambertools dependency.
If antechamber isn't in the path, would it also be possible to check to see if AMBERHOME has been set and search within ${AMBERHOME}/bin? (I've run into a few situations where this is the case, so we've found that relying on AMBERHOME is more reliable than the PATH. It's also easier to set.)
That's a TODO in our code:
https://github.com/openforcefield/openff-toolkit/blob/de8a4a545351301adfe424dff0d879b2dd13bc0b/openff/toolkit/utils/toolkits.py#L4437-L4442
So we should probably get to that. Might be tricky to properly test with the way we install things, but I'll think about it.
It would be good to support SMILES input with OpenFF. We could achieve this by allowing a SMILES string as the molecule argument for all OpenFF parameterisation functions. If a string was passed to a non-OpenFF function, then an error would be raised. This would keep things consistent with our current framework meaning that no additional arguments would be required when parameterising with different toolkits.
It would also be good to support searching for multiple antechamber executables until we find one that is >= 20.0. This would allow a user to install AmberTools20 from conda-forge and use it alongside an existing, e.g. Amber18 installation, i.e. they would use AmberTools20 for OpenFF support, while taking advantage of an existing license for using PMEMD.
It might get messy if we need a molecule and a SMILES string, e.g. for the coordinates and the topology, although I guess we could ignore the SMILES string argument in any functions that don't use it.
If it's a matter of just one molecule, it's somewhat straightforward to get it through the pipeline
smiles = "CCO"
mol = Molecule.from_smiles(smiles)
mol.generate_conformers(n_conformers=1)
# Proceed as already happens
top = mol.to_topology()
forcefield.create_openmm_system(top)
the Topology does not carry positions itself, like the openmm.System but those can be attached later by accessing mol.conformers[0].
You're right that it gets messy - multi-molecule setups aren't straightforward here, using custom coordinates would require a second argument, etc. Also expect sharp edges because of the information content of SMILES vs. an SDF file, i.e. a SMILES ambiguous stereochemistry would be a problem (there's an allow_undefined_stereo flag if this needs to be quashed)
We now support parameterisation using both SMILES strings and BioSimSpace Molecule objects (i.e. created by reading a file). This works for all of the supported parameterisation engines, with the caveat that we go via an intermediate PDB file when using AmberTools.
It is now possible to do:
import BioSimSpace as BSS
# Load a methanol PDB file.
molecule = BSS.IO.readMolecules("demos/molecules/methanol.pdb")[0]
# Create a SMILES representation of the molecule.
smiles = "CO"
# Parameterise with OpenFF using the Molecule object.
off_mol = BSS.Parameters.openff_unconstrained_1_0_0(molecule).getMolecule()
# Parameterise with OpenFF using the SMILES string.
off_smiles = BSS.Parameters.openff_unconstrained_1_0_0(smiles).getMolecule()
# Parameterise with GAFF using the Molecule object.
gaff_mol = BSS.Parameters.gaff(molecule).getMolecule()
# Parameterise with GAFF using the SMILES string.
gaff_smiles = BSS.Parameters.gaff(smiles).getMolecule()
It seems like the documentation needs update:

Thanks, the OpenFF functions are created dynamically at run-time depending on what force fields are available from the users install. In turn, the documentation is also auto generated. It looks like the template function has the wrong docstring, I'll take a look later.
I wonder if BSS currently supports openFF? I tried to use BSS to Parameterise OpenFF and it failed so I was wondering if it has been implemented.
Hello there,
Thanks for the question. Yes, we have supported OpenFF for around a year now. However, we currently only support the unconstrained versions of the force fields.
When you say it failed, what exactly are you referring to? Was the functionality simply unavailable, or did you get an error when trying to parametrise? If you could let us know your operating system and installation method (conda or source) along BioSimSpace version and error message that you are seeing, then we'd be happy to debug. You can find the BioSimSpace version as follows:
import BioSimSpace
print(BSS.__version__)
@lohedges Hi, so I set up the env with
conda create -n biosimspace -c conda-forge -c michellab biosimspace
conda install -c conda-forge ambertools
Then I feed in a pdb file
system = BSS.IO.readMolecules('lig.pdb')
molecule = system[0]
# Create a background process to parameterise the molecule.
process = BSS.Parameters.openff_unconstrained_2_0_0(molecule)
# Get the parameterised molecule from the process.
molecule = process.getMolecule()
# Get the intermediate files from the background process.
process.getOutput(filename="param_eror.zip")
which gives me
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Traceback (most recent call last):
File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-fd0859cb3195>", line 1, in <cell line: 1>
runfile('/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/full_xtb/run.py', wdir='/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/full_xtb')
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/full_xtb/run.py", line 20, in <module>
molecule = process.getMolecule()
File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/BioSimSpace/Parameters/_process.py", line 227, in getMolecule
raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error)) from None
BioSimSpace._Exceptions._exceptions.ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!'
The lig.pdb is
CRYST1 53.880 55.610 57.240 90.00 90.00 90.00 P 1
HETATM 1 C0 lig 1 -1.960 21.550 -27.330 0.00 0.00 C
HETATM 2 C1 lig 1 -1.290 22.410 -28.200 0.00 0.00 C
HETATM 3 C2 lig 1 -1.960 22.950 -29.290 0.00 0.00 C
HETATM 4 C3 lig 1 -3.300 22.620 -29.540 0.00 0.00 C
HETATM 5 C4 lig 1 -3.970 21.760 -28.650 0.00 0.00 C
HETATM 6 C5 lig 1 -3.290 21.230 -27.560 0.00 0.00 C
HETATM 7 Cl6 lig 1 -5.640 21.350 -28.920 0.00 0.00 CL
HETATM 8 C7 lig 1 -4.040 23.150 -30.710 0.00 0.00 C
HETATM 9 O8 lig 1 -4.230 22.420 -31.670 0.00 0.00 O
HETATM 10 N9 lig 1 -4.510 24.410 -30.650 0.00 0.00 N
HETATM 11 C10 lig 1 -5.290 25.110 -31.580 0.00 0.00 C
HETATM 12 C11 lig 1 -5.960 24.550 -32.680 0.00 0.00 C
HETATM 13 C12 lig 1 -6.780 25.360 -33.460 0.00 0.00 C
HETATM 14 N13 lig 1 -6.920 26.650 -33.180 0.00 0.00 N
HETATM 15 C14 lig 1 -6.300 27.230 -32.160 0.00 0.00 C
HETATM 16 C15 lig 1 -5.510 26.460 -31.310 0.00 0.00 C
HETATM 17 N16 lig 1 -6.540 28.580 -31.880 0.00 0.00 N
HETATM 18 C17 lig 1 -5.710 29.450 -31.230 0.00 0.00 C
HETATM 19 O18 lig 1 -4.620 29.120 -30.810 0.00 0.00 O
HETATM 20 C19 lig 1 -6.230 30.850 -31.030 0.00 0.00 C
HETATM 21 Cl20 lig 1 -1.110 24.050 -30.330 0.00 0.00 CL
HETATM 22 H21 lig 1 -6.760 31.250 -31.890 0.00 0.00 H
HETATM 23 H22 lig 1 -1.440 21.130 -26.480 0.00 0.00 H
HETATM 24 H23 lig 1 -0.260 22.660 -28.020 0.00 0.00 H
HETATM 25 H24 lig 1 -3.810 20.560 -26.890 0.00 0.00 H
HETATM 26 H25 lig 1 -4.320 24.980 -29.840 0.00 0.00 H
HETATM 27 H26 lig 1 -5.840 23.500 -32.910 0.00 0.00 H
HETATM 28 H27 lig 1 -7.310 24.940 -34.300 0.00 0.00 H
HETATM 29 H28 lig 1 -5.060 26.910 -30.430 0.00 0.00 H
HETATM 30 H29 lig 1 -7.400 29.010 -32.180 0.00 0.00 H
HETATM 31 H30 lig 1 -6.890 30.870 -30.160 0.00 0.00 H
HETATM 32 H31 lig 1 -5.460 32.770 -30.540 0.00 0.00 H
HETATM 33 O32 lig 1 -5.090 31.810 -30.670 0.00 0.00 O
END
The print(BSS.version) is 2022.2.0 and OS is MacOS 12.3 with M1 chip.
Thanks for the info. After making BioSimSpace output verbose error messages I see the following:
# Do this at the start of your script if you want more error output. A lot of
# the time this is very noisy and unhelpful, so it is disabled by default.
BSS.setVerbose(True)
ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!: UndefinedStereochemistryError('Unable to make OFFMol from RDMol: Unable to make OFFMol from RDMol: RDMol has unspecified stereochemistry. RDMol name: Undefined chiral centers are:\n - Atom C (index 1)\n - Atom C (index 2)\n - Atom C (index 4)\n - Atom C (index 5)\n - Atom C (index 7)\n - Atom C (index 11)\n - Atom C (index 12)\n - Atom C (index 15)\n - Atom C (index 17)\n')'
It looks like the RDKit molecule created from your PDB can't be converted to an OpenFF molecule as the stereochemistry is undefined. This is an issue with using PDB and, unfortunately, we don't currently support SDF format. You could try using Mol2 format instead, or try passing a SMILES representation of the molecule.
Cheers.
OpenFF does has an option allow_undefined_stereo=True when reading molecules, but this also doesn't work in this case. (It fails at a later stage though.) I experimented with this option when first adding OpenFF support and found that it caused more problems than it solved, i.e. it was easier to have the user specify the correct stereochemistry from the outset.
Thanks. I will try to use antechamber to generate the input file.
@lohedges Thank you. I have used
antechamber -i lig.pdb -fi pdb -o ligand.sybyl.mol2 -fo mol2 -c bcc -s 2 -at sybyl to generate the mol2 file as input, but it seems it would still give me error.
Traceback (most recent call last):
File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-5b69c677995e>", line 1, in <cell line: 1>
runfile('/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/sage/run.py', wdir='/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/sage')
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/zwu/Documents/protein-ligand-benchmark/2020-02-07_tyk2/sage/run.py", line 17, in <module>
molecule = process.getMolecule()
File "/Users/zwu/miniconda3_x86/envs/biosimspace/lib/python3.9/site-packages/BioSimSpace/Parameters/_process.py", line 227, in getMolecule
raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error)) from None
BioSimSpace._Exceptions._exceptions.ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!'
The mol2 file is
@<TRIPOS>MOLECULE
lig
32 33 1 0 0
SMALL
bcc
@<TRIPOS>ATOM
1 C0 -1.9600 21.5500 -27.3300 C.ar 1 lig -0.096000
2 C1 -1.2900 22.4100 -28.2000 C.ar 1 lig -0.129000
3 C2 -1.9600 22.9500 -29.2900 C.ar 1 lig 0.052400
4 C3 -3.3000 22.6200 -29.5400 C.ar 1 lig -0.143600
5 C4 -3.9700 21.7600 -28.6500 C.ar 1 lig 0.052400
6 C5 -3.2900 21.2300 -27.5600 C.ar 1 lig -0.129000
7 CL6 -5.6400 21.3500 -28.9200 Cl 1 lig -0.060400
8 C7 -4.0400 23.1500 -30.7100 C.2 1 lig 0.691700
9 O8 -4.2300 22.4200 -31.6700 O.2 1 lig -0.550100
10 N9 -4.5100 24.4100 -30.6500 N.am 1 lig -0.460100
11 C10 -5.2900 25.1100 -31.5800 C.ar 1 lig 0.127600
12 C11 -5.9600 24.5500 -32.6800 C.ar 1 lig -0.304300
13 C12 -6.7800 25.3600 -33.4600 C.ar 1 lig 0.445200
14 N13 -6.9200 26.6500 -33.1800 N.ar 1 lig -0.738000
15 C14 -6.3000 27.2300 -32.1600 C.ar 1 lig 0.551200
16 C15 -5.5100 26.4600 -31.3100 C.ar 1 lig -0.323300
17 N16 -6.5400 28.5800 -31.8800 N.am 1 lig -0.553400
18 C17 -5.7100 29.4500 -31.2300 C.2 1 lig 0.671100
19 O18 -4.6200 29.1200 -30.8100 O.2 1 lig -0.593100
20 C19 -6.2300 30.8500 -31.0300 C.3 1 lig -0.178100
21 CL20 -1.1100 24.0500 -30.3300 Cl 1 lig -0.060400
22 H21 -7.3100 30.9400 -31.1600 H 1 lig 0.071033
23 H22 -1.4400 21.1300 -26.4800 H 1 lig 0.148000
24 H23 -0.2600 22.6600 -28.0200 H 1 lig 0.157000
25 H24 -3.8100 20.5600 -26.8900 H 1 lig 0.157000
26 H25 -4.3200 24.9800 -29.8400 H 1 lig 0.330500
27 H26 -5.8400 23.5000 -32.9100 H 1 lig 0.183000
28 H27 -7.3100 24.9400 -34.3000 H 1 lig 0.025100
29 H28 -5.0600 26.9100 -30.4300 H 1 lig 0.180000
30 H29 -7.4000 29.0100 -32.1800 H 1 lig 0.331500
31 H30 -5.7200 31.5200 -31.7200 H 1 lig 0.071033
32 H31 -5.9600 31.1900 -30.0400 H 1 lig 0.071033
@<TRIPOS>BOND
1 1 2 ar
2 1 6 ar
3 1 23 1
4 2 3 ar
5 2 24 1
6 3 4 ar
7 3 21 1
8 4 5 ar
9 4 8 1
10 5 6 ar
11 5 7 1
12 6 25 1
13 8 9 2
14 8 10 am
15 10 11 1
16 10 26 1
17 11 12 ar
18 11 16 ar
19 12 13 ar
20 12 27 1
21 13 14 ar
22 13 28 1
23 14 15 ar
24 15 16 ar
25 15 17 1
26 16 29 1
27 17 18 am
28 17 30 1
29 18 19 2
30 18 20 1
31 20 22 1
32 20 31 1
33 20 32 1
@<TRIPOS>SUBSTRUCTURE
1 lig 1 TEMP 0 **** **** 0 ROOT
I wonder if you mind help me, please? Thank you.
Hi again,
I just realised that our RDKit to OpenFF conversion always goes via PDB format, so we will lose the Mol2 stereochemistry regardless. (Your error above is the same if you turn on verbose messages with BSS.setVerbose(True).)
With this in mind, I've modified my local version of BioSimSpace so that it will go via Mol2 format if that's what the original molecule was loaded from. However, even after doing this it still errors, but this time with:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
/home/lester/sire.app/bin/wrapped_progs/antechamber: Fatal Error!
Unable to find sqm charges in file (sqm.out).
Verify the filename and the file contents.
/home/lester/sire.app/lib/python3.7/site-packages/openff/toolkit/typing/engines/smirnoff/parameters.py:4118: Warning: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name 'lig' and SMILES '[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[Cl-].[Cl-].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[N-3].[N-3].[N-3].[O-2].[O-2]', 'partial_charge_method': 'am1bcc', 'use_conformers': None, 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.FrozenMolecule'>}"
Available toolkits are: [ToolkitWrapper around The RDKit version 2021.09.5, ToolkitWrapper around AmberTools version 21.0, ToolkitWrapper around Built-in Toolkit version None]
ToolkitWrapper around The RDKit version 2021.09.5 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1bcc' is not available from RDKitToolkitWrapper. Available charge methods are ['mmff94']
ToolkitWrapper around AmberTools version 21.0 <class 'subprocess.CalledProcessError'> : Command '['antechamber', '-i', 'molecule.sdf', '-fi', 'sdf', '-o', 'charged.mol2', '-fo', 'mol2', '-pf', 'yes', '-dr', 'n', '-c', 'bcc', '-nc', '-4.0']' returned non-zero exit status 1.
ToolkitWrapper around Built-in Toolkit version None <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : Partial charge method "am1bcc"" is not supported by the Built-in toolkit. Available charge methods are ['zeros', 'formal_charge']
warnings.warn(str(e), Warning)
---------------------------------------------------------------------------
ParameterisationError Traceback (most recent call last)
<ipython-input-5-5586526ef716> in <module>
----> 1 lig = BSS.Parameters.openff_unconstrained_2_0_0(lig).getMolecule()
~/Code/BioSimSpace/python/BioSimSpace/Parameters/_process.py in getMolecule(self)
223 if self._is_error:
224 if _isVerbose():
--> 225 raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error))
226 else:
227 raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error)) from None
ParameterisationError: Parameterisation failed! Last error: 'Unable to create OpenMM System!: UnassignedMoleculeChargeException('The following molecules did not have charges assigned by any ParameterHandler in the ForceField:\n[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[Cl-].[Cl-].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[H+].[N-3].[N-3].[N-3].[O-2].[O-2]\n')'
I'll see if I can try some other formats.
This is how I usually create the openFF format from mol2 file if it could be helpful.
try:
from openmm.app import PDBFile
except ImportError:
from simtk.openmm.app import PDBFile
import parmed
import MDAnalysis as mda
from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.utils import get_data_file_path
mol2_parm = mda.Universe('ligand.sybyl.mol2')
mol2_parm.atoms.write('ligand.pdb')
pdbfile = PDBFile('ligand.pdb')
m = Molecule.from_file('ligand.sybyl.mol2',
allow_undefined_stereo=True)
omm_topology = pdbfile.topology
# Create the Open Forcefield topology
off_topology = Topology.from_openmm(omm_topology, unique_molecules=[m])
# Load the "Parsley" force field
forcefield = ForceField('openff_unconstrained-2.0.0.offxml')
omm_system = forcefield.create_openmm_system(off_topology)
# Convert to a ParmEd structure
parmed_structure = parmed.openmm.load_topology(omm_topology, omm_system,
pdbfile.positions)
Does the molecule loaded here
system = BSS.IO.readMolecules('lig.pdb')
have bond order and stereochemistry information? I assume no, since it's a PDB file, but I don't know what else is added into this object model.
I just realised that our RDKit to OpenFF conversion always goes via PDB format
In general, passing a well-formed (...ish) RDKit molecule to OpenFF is best done by Molecule.from_rdkit - alternatives being SDF files (from_file) or SMILES (from_smiles or from_mapped_smiles depending on the flavor). Probably with allow_undefined_stereo=True used in a few places.
Thanks, both.
...have bond order and stereochemistry information? I assume no, since it's a PDB file, but I don't know what else is added into this object model.
No, and this is the general problem when using BioSimSpace to try to parameterise with OpenFF. Since we don't have an SDF reader, we assume that ligands are loaded from PDB or Mol2 format. This works fine when using AmberTools or pdb2gmx, but isn't robust when using OpenFF. As an alternative we also support passing a SMILES string directly. For this ligand, the following works:
# The following was generated by RDKit.
smiles = "CC(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1"
lig = BSS.Parameters.openff_unconstrained_2_0_0(smiles).getMolecule()
Perhaps I should have a fallback that converts internalyl to SMILES if the parameterisation fails.
Probably with allow_undefined_stereo=True used in a few places.
I've previously tried this option with limited success. Ideally we'd like the user to specify everything upfront to avoid any nasty surprises downstream. I guess I could add an option to handle this, though, or warn the user if it was applied when the initial conversion fails.
Cheers.