openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

Canonical order before conformer generation

Open ijpulidos opened this issue 3 years ago • 6 comments

Fixes #926

ijpulidos avatar May 19 '21 04:05 ijpulidos

I noticed conformer generation behavior changed significantly between 2020.09 and 2021.03 versions of rdkit. This can be checked using the following snippet

for i in range(5,15):
    smiles = 'C' * i
    mol_oe = Molecule.from_smiles(smiles)
    mol_rdk = Molecule.from_smiles(smiles)
    mol_oe.generate_conformers(n_conformers=100,
                               toolkit_registry=OpenEyeToolkitWrapper())
    mol_rdk.generate_conformers(n_conformers=100,
                               toolkit_registry=RDKitToolkitWrapper())
    print(i, mol_oe.n_conformers, mol_rdk.n_conformers)

For rdkit 2020.09 versions the output is similar to the following:

Columns are: # carbons, OE # conformers, RDKIT # conformers

5 1 41
6 2 55
7 3 80
8 5 88
9 10 92
10 12 96
11 26 99
12 51 100
13 100 100
14 100 100

whereas using 2021.03 versions it is

5 1 1
6 2 1
7 3 1
8 5 1
9 10 1
10 12 1
11 26 2
12 51 2
13 100 2
14 100 2

I don't get significant differences between older behavior (without ordering atoms) with the new one (ordering before generation). Just differences due to rdkit version.

ijpulidos avatar May 19 '21 04:05 ijpulidos

Codecov Report

Merging #942 (12e5a27) into master (a0bcb3b) will decrease coverage by 10.39%. The diff coverage is 50.00%.

codecov[bot] avatar May 19 '21 04:05 codecov[bot]

Wow, that's a dramatic reduction in number of conformers. I wonder if it's because of the changes in the 2021_03_1 release, namely:

The conformer generator now uses symmetry by default when doing RMS pruning. This can be disabled using the useSymmetryForPruning flag in EmbedParameters. (#3813)

I think the above change in default behavior is good, but I'm nervous about the now-even-bigger difference between the backends. Is there some way to check that these are both doing the "right" thing (maybe RDKit is just way less "creative" about trying diverse geometry)?

Some places to look might be:

  • Whether we're using OpenEye to generate conformers in a symmetry-aware way (like RDKit now does)
  • Whether both wrappers are using all-atom or heavy-atom RMSD

j-wags avatar May 20 '21 18:05 j-wags

Probably related to this change: https://github.com/rdkit/rdkit/pull/3813/files#diff-fca50d1b074a5063b8fac48070bc842fea837c6a3aceb9715690112d76f407feR428-R432

It's confusing why the original molecule WAS passing tests before, but isn't now. It turns out that ONE ordering of CCCCCCCCCN makes two conformers, but the ANOTHER ordering only makes one:

from openff.toolkit.utils.toolkits import RDKitToolkitWrapper
from simtk.openmm import unit
toolkit_wrapper = RDKitToolkitWrapper()
smiles = "CCCCCCCCCN"
molecule = toolkit_wrapper.from_smiles(smiles)
molecule.generate_conformers(
    rms_cutoff=1 * unit.angstrom,
    n_conformers=100,
    toolkit_registry=toolkit_wrapper,
)
print(molecule.n_conformers)

2

But if we put it in the opposite order (just writing the SMILES backwards) only one conformer is made

from openff.toolkit.utils.toolkits import RDKitToolkitWrapper
from simtk.openmm import unit
toolkit_wrapper = RDKitToolkitWrapper()
smiles = "NCCCCCCCCC"
molecule = toolkit_wrapper.from_smiles(smiles)
molecule.generate_conformers(
    rms_cutoff=1 * unit.angstrom,
    n_conformers=100,
    toolkit_registry=toolkit_wrapper,
)
print(molecule.n_conformers)

1

j-wags avatar May 20 '21 18:05 j-wags

Ok, so my best guess right now is that RDKit is using heavy-atom RMSD, while OpenEye is using all-atom. In this case, it's probably best to switch both of them to be heavy-atom, and then ensure that we get roughly comparable numbers of conformers out of each (ideally within a factor of 5?)

j-wags avatar May 20 '21 18:05 j-wags

~Does the bug still get fixed if you just canonically reorder the atoms inside the Molecule before pushing to the toolkits?~ e.g. https://github.com/openforcefield/openff-toolkit/blob/684e25a27cade6090d53af6c97fa53485b4c3de2/openff/toolkit/topology/molecule.py#L3436-L3438

Actually looking more closely, partial charge assignment calls the generate_conformers directly within the toolkit so this solution is better.

lilyminium avatar Jun 14 '21 15:06 lilyminium

Closing as this has gone stale (not at all a reflection of the quality of the idea or work).

If we do want this change merged at some point, it should be straightforward enough to move these changes onto a more recent base branch.

mattwthompson avatar Oct 18 '22 16:10 mattwthompson