openff-toolkit
openff-toolkit copied to clipboard
Canonically order molecule before conformer generation
Is your feature request related to a problem? Please describe.
The ordering of a molecule can affect which conformers are generated for it using OE (and probably also RDKit). This can then lead to different charges and WBOs being produced by the same TK for the same molecule.
The below example shows that the number of conformers generated for different molecules ordering can change significantly:
from openeye import oechem, oeomega
oe_molecule = oechem.OEMol()
oechem.OESmilesToMol(
oe_molecule, "CC(C)(C)c1sc(c2ccnc(N)n2)c(n1)c3cccc(N[S](=O)(=O)c4c(F)cccc4F)c3F"
)
omega = oeomega.OEOmega()
omega.SetMaxConfs(800)
omega.SetEnergyWindow(15.0)
omega.SetRMSThreshold(1.0)
omega.SetCanonOrder(False)
omega.SetSampleHydrogens(True)
omega(oe_molecule)
print(oe_molecule.NumConfs())
print(oechem.OEMolToSmiles(oe_molecule))
>> 156
>> CC(C)(C)c1nc(c(s1)c2ccnc(n2)N)c3cccc(c3F)NS(=O)(=O)c4c(cccc4F)F
oe_molecule = oechem.OEMol()
oechem.OESmilesToMol(
oe_molecule, "CC(C)(C)c1sc(c2ccnc(N)n2)c(n1)c3cccc(N[S](=O)(=O)c4c(F)cccc4F)c3F"
)
omega = oeomega.OEOmega()
omega.SetMaxConfs(800)
omega.SetEnergyWindow(15.0)
omega.SetRMSThreshold(1.0)
omega.SetCanonOrder(True)
omega.SetSampleHydrogens(True)
omega(oe_molecule)
print(oe_molecule.NumConfs())
print(oechem.OEMolToSmiles(oe_molecule))
>> 255
>> CC(C)(C)c1nc(c(s1)c2ccnc(n2)N)c3cccc(c3F)NS(=O)(=O)c4c(cccc4F)F
Describe the solution you'd like
To increase consistency it would be good to canonically order the molecule prior to conformer generation, or in the case of OE, set omega.SetCanonOrder(True)
Describe alternatives you've considered
Canonically order the molecule manually, but this isn't ideal in a lot of cases.
Additional context Add any other context or screenshots about the feature request here.
Just made a script to show the problem also happens with rdkit, fwiw. Inspired in @SimonBoothroyd previous code.
https://gist.github.com/ijpulidos/7b0b9ac7d3e4a1692a1dee2825da3b98