Fixes #926

[x] Tag issue being addressed
[ ] Add tests
[ ] Update docstrings/documentation, if applicable
[ ] Lint codebase
[ ] Update changelog

May 19 '21 04:05 ijpulidos

I noticed conformer generation behavior changed significantly between 2020.09 and 2021.03 versions of rdkit. This can be checked using the following snippet

for i in range(5,15):
    smiles = 'C' * i
    mol_oe = Molecule.from_smiles(smiles)
    mol_rdk = Molecule.from_smiles(smiles)
    mol_oe.generate_conformers(n_conformers=100,
                               toolkit_registry=OpenEyeToolkitWrapper())
    mol_rdk.generate_conformers(n_conformers=100,
                               toolkit_registry=RDKitToolkitWrapper())
    print(i, mol_oe.n_conformers, mol_rdk.n_conformers)

For rdkit 2020.09 versions the output is similar to the following:

Columns are: # carbons, OE # conformers, RDKIT # conformers

whereas using 2021.03 versions it is

I don't get significant differences between older behavior (without ordering atoms) with the new one (ordering before generation). Just differences due to rdkit version.

May 19 '21 04:05 ijpulidos

Codecov Report

Merging #942 (12e5a27) into master (a0bcb3b) will decrease coverage by 10.39%. The diff coverage is 50.00%.

May 19 '21 04:05 codecov[bot]

Wow, that's a dramatic reduction in number of conformers. I wonder if it's because of the changes in the 2021_03_1 release, namely:

The conformer generator now uses symmetry by default when doing RMS pruning. This can be disabled using the useSymmetryForPruning flag in EmbedParameters. (#3813)

I think the above change in default behavior is good, but I'm nervous about the now-even-bigger difference between the backends. Is there some way to check that these are both doing the "right" thing (maybe RDKit is just way less "creative" about trying diverse geometry)?

Some places to look might be:

Whether we're using OpenEye to generate conformers in a symmetry-aware way (like RDKit now does)
Whether both wrappers are using all-atom or heavy-atom RMSD

May 20 '21 18:05 j-wags

Probably related to this change: https://github.com/rdkit/rdkit/pull/3813/files#diff-fca50d1b074a5063b8fac48070bc842fea837c6a3aceb9715690112d76f407feR428-R432

It's confusing why the original molecule WAS passing tests before, but isn't now. It turns out that ONE ordering of CCCCCCCCCN makes two conformers, but the ANOTHER ordering only makes one:

from openff.toolkit.utils.toolkits import RDKitToolkitWrapper
from simtk.openmm import unit
toolkit_wrapper = RDKitToolkitWrapper()
smiles = "CCCCCCCCCN"
molecule = toolkit_wrapper.from_smiles(smiles)
molecule.generate_conformers(
    rms_cutoff=1 * unit.angstrom,
    n_conformers=100,
    toolkit_registry=toolkit_wrapper,
)
print(molecule.n_conformers)

2

But if we put it in the opposite order (just writing the SMILES backwards) only one conformer is made

from openff.toolkit.utils.toolkits import RDKitToolkitWrapper
from simtk.openmm import unit
toolkit_wrapper = RDKitToolkitWrapper()
smiles = "NCCCCCCCCC"
molecule = toolkit_wrapper.from_smiles(smiles)
molecule.generate_conformers(
    rms_cutoff=1 * unit.angstrom,
    n_conformers=100,
    toolkit_registry=toolkit_wrapper,
)
print(molecule.n_conformers)

1

May 20 '21 18:05 j-wags

Ok, so my best guess right now is that RDKit is using heavy-atom RMSD, while OpenEye is using all-atom. In this case, it's probably best to switch both of them to be heavy-atom, and then ensure that we get roughly comparable numbers of conformers out of each (ideally within a factor of 5?)

May 20 '21 18:05 j-wags

~Does the bug still get fixed if you just canonically reorder the atoms inside the Molecule before pushing to the toolkits?~ e.g. https://github.com/openforcefield/openff-toolkit/blob/684e25a27cade6090d53af6c97fa53485b4c3de2/openff/toolkit/topology/molecule.py#L3436-L3438

Actually looking more closely, partial charge assignment calls the generate_conformers directly within the toolkit so this solution is better.

Jun 14 '21 15:06 lilyminium

Closing as this has gone stale (not at all a reflection of the quality of the idea or work).

If we do want this change merged at some point, it should be straightforward enough to move these changes onto a more recent base branch.

Oct 18 '22 16:10 mattwthompson

openff-toolkit
openff-toolkit copied to clipboard

Canonical order before conformer generation

Codecov Report

openff-toolkit openff-toolkit copied to clipboard

Canonical order before conformer generation

Codecov Report

openff-toolkit
openff-toolkit copied to clipboard