openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

Canonically order molecule before conformer generation

Open SimonBoothroyd opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe.

The ordering of a molecule can affect which conformers are generated for it using OE (and probably also RDKit). This can then lead to different charges and WBOs being produced by the same TK for the same molecule.

The below example shows that the number of conformers generated for different molecules ordering can change significantly:

from openeye import oechem, oeomega

oe_molecule = oechem.OEMol()
oechem.OESmilesToMol(
    oe_molecule, "CC(C)(C)c1sc(c2ccnc(N)n2)c(n1)c3cccc(N[S](=O)(=O)c4c(F)cccc4F)c3F"
)
omega = oeomega.OEOmega()
omega.SetMaxConfs(800)
omega.SetEnergyWindow(15.0)
omega.SetRMSThreshold(1.0)
omega.SetCanonOrder(False)
omega.SetSampleHydrogens(True)
omega(oe_molecule)
print(oe_molecule.NumConfs())
print(oechem.OEMolToSmiles(oe_molecule))

>> 156
>> CC(C)(C)c1nc(c(s1)c2ccnc(n2)N)c3cccc(c3F)NS(=O)(=O)c4c(cccc4F)F

oe_molecule = oechem.OEMol()
oechem.OESmilesToMol(
    oe_molecule, "CC(C)(C)c1sc(c2ccnc(N)n2)c(n1)c3cccc(N[S](=O)(=O)c4c(F)cccc4F)c3F"
)
omega = oeomega.OEOmega()
omega.SetMaxConfs(800)
omega.SetEnergyWindow(15.0)
omega.SetRMSThreshold(1.0)
omega.SetCanonOrder(True)
omega.SetSampleHydrogens(True)
omega(oe_molecule)
print(oe_molecule.NumConfs())
print(oechem.OEMolToSmiles(oe_molecule))

>> 255
>> CC(C)(C)c1nc(c(s1)c2ccnc(n2)N)c3cccc(c3F)NS(=O)(=O)c4c(cccc4F)F

Describe the solution you'd like

To increase consistency it would be good to canonically order the molecule prior to conformer generation, or in the case of OE, set omega.SetCanonOrder(True)

Describe alternatives you've considered

Canonically order the molecule manually, but this isn't ideal in a lot of cases.

Additional context Add any other context or screenshots about the feature request here.

SimonBoothroyd avatar May 06 '21 09:05 SimonBoothroyd

Just made a script to show the problem also happens with rdkit, fwiw. Inspired in @SimonBoothroyd previous code.

https://gist.github.com/ijpulidos/7b0b9ac7d3e4a1692a1dee2825da3b98

ijpulidos avatar May 14 '21 00:05 ijpulidos