openff-toolkit
openff-toolkit copied to clipboard
Canonical order before conformer generation
Fixes #926
- [x] Tag issue being addressed
- [ ] Add tests
- [ ] Update docstrings/documentation, if applicable
- [ ] Lint codebase
- [ ] Update changelog
I noticed conformer generation behavior changed significantly between 2020.09 and 2021.03 versions of rdkit. This can be checked using the following snippet
for i in range(5,15):
smiles = 'C' * i
mol_oe = Molecule.from_smiles(smiles)
mol_rdk = Molecule.from_smiles(smiles)
mol_oe.generate_conformers(n_conformers=100,
toolkit_registry=OpenEyeToolkitWrapper())
mol_rdk.generate_conformers(n_conformers=100,
toolkit_registry=RDKitToolkitWrapper())
print(i, mol_oe.n_conformers, mol_rdk.n_conformers)
For rdkit 2020.09 versions the output is similar to the following:
Columns are: # carbons, OE # conformers, RDKIT # conformers
5 1 41
6 2 55
7 3 80
8 5 88
9 10 92
10 12 96
11 26 99
12 51 100
13 100 100
14 100 100
whereas using 2021.03 versions it is
5 1 1
6 2 1
7 3 1
8 5 1
9 10 1
10 12 1
11 26 2
12 51 2
13 100 2
14 100 2
I don't get significant differences between older behavior (without ordering atoms) with the new one (ordering before generation). Just differences due to rdkit version.
Codecov Report
Merging #942 (12e5a27) into master (a0bcb3b) will decrease coverage by
10.39%
. The diff coverage is50.00%
.
Wow, that's a dramatic reduction in number of conformers. I wonder if it's because of the changes in the 2021_03_1 release, namely:
The conformer generator now uses symmetry by default when doing RMS pruning. This can be disabled using the useSymmetryForPruning flag in EmbedParameters. (#3813)
I think the above change in default behavior is good, but I'm nervous about the now-even-bigger difference between the backends. Is there some way to check that these are both doing the "right" thing (maybe RDKit is just way less "creative" about trying diverse geometry)?
Some places to look might be:
- Whether we're using OpenEye to generate conformers in a symmetry-aware way (like RDKit now does)
- Whether both wrappers are using all-atom or heavy-atom RMSD
Probably related to this change: https://github.com/rdkit/rdkit/pull/3813/files#diff-fca50d1b074a5063b8fac48070bc842fea837c6a3aceb9715690112d76f407feR428-R432
It's confusing why the original molecule WAS passing tests before, but isn't now. It turns out that ONE ordering of CCCCCCCCCN
makes two conformers, but the ANOTHER ordering only makes one:
from openff.toolkit.utils.toolkits import RDKitToolkitWrapper
from simtk.openmm import unit
toolkit_wrapper = RDKitToolkitWrapper()
smiles = "CCCCCCCCCN"
molecule = toolkit_wrapper.from_smiles(smiles)
molecule.generate_conformers(
rms_cutoff=1 * unit.angstrom,
n_conformers=100,
toolkit_registry=toolkit_wrapper,
)
print(molecule.n_conformers)
2
But if we put it in the opposite order (just writing the SMILES backwards) only one conformer is made
from openff.toolkit.utils.toolkits import RDKitToolkitWrapper
from simtk.openmm import unit
toolkit_wrapper = RDKitToolkitWrapper()
smiles = "NCCCCCCCCC"
molecule = toolkit_wrapper.from_smiles(smiles)
molecule.generate_conformers(
rms_cutoff=1 * unit.angstrom,
n_conformers=100,
toolkit_registry=toolkit_wrapper,
)
print(molecule.n_conformers)
1
Ok, so my best guess right now is that RDKit is using heavy-atom RMSD, while OpenEye is using all-atom. In this case, it's probably best to switch both of them to be heavy-atom, and then ensure that we get roughly comparable numbers of conformers out of each (ideally within a factor of 5?)
~Does the bug still get fixed if you just canonically reorder the atoms inside the Molecule before pushing to the toolkits?~ e.g. https://github.com/openforcefield/openff-toolkit/blob/684e25a27cade6090d53af6c97fa53485b4c3de2/openff/toolkit/topology/molecule.py#L3436-L3438
Actually looking more closely, partial charge assignment calls the generate_conformers
directly within the toolkit so this solution is better.
Closing as this has gone stale (not at all a reflection of the quality of the idea or work).
If we do want this change merged at some point, it should be straightforward enough to move these changes onto a more recent base branch.