coordgenlibs
coordgenlibs copied to clipboard
Really slow performance with some macrocycles
We found an issue when using coordgen with rdkit. When activating coordgen, it is extremely slow to work with such chiral macrocycles. Removing the chirality from the atoms makes it fast again.
(roughly 100 times slower for the given molecule).
from rdkit.Chem import rdDepictor
from rdkit import Chem
### This is slow
rdDepictor.SetPreferCoordGen(True)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))
### This is fast
rdDepictor.SetPreferCoordGen(False)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))
This is a crossposted issue with rdkit https://github.com/rdkit/rdkit/issues/5813
Thanks for the report! this is very interesting
I'm still trying to figure out the impact of chirality, it seems that in some cases it may be even slower without chirality.
Ran some better benchmarks, for that molecule at least, it seems the chirality has no impact.
However, the more flexible it is the more time it takes.
This takes twice the time as the previous molecule: CC1CCCCCCCCCOCCNCCCCCCCCCCOCCNC(C)CCCCCCCCC(=O)OCCN1
CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2m on my machine CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2s
I think I'll throw a profiler at that
Minimal example that doesn't require rdkit
BOOST_AUTO_TEST_CASE(SlowMacrocycle)
{
auto mol = "CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1"_smiles;
BOOST_TEST(mol->getBonds()[0]->getBondOrder() == 1);
sketcherMinimizer minimizer;
minimizer.initialize(mol.get());
minimizer.runGenerateCoordinates();
const auto& atoms = minimizer.getAtoms();
sketcherMinimizerAtom* center = atoms.at(0);
BOOST_REQUIRE_EQUAL(center->getAtomicNumber(), 6);
}
The slowdown occurs in file: .../coordgen/CoordgenMacrocycleBuilder.cpp At line 682: if (checkedMacrocycles > MAX_MACROCYCLES) { break; } MAX_MACROCYCLES is set to 40, and it takes a long time to get there for the bad mol. OR, the acceptableScore calculated and checked just above that could be the issue.
It is calculated as: numberOfAtoms * SUBSTITUTED_ATOM_RESTRAINT / 2
and SUBSTITUTED_ATOM_RESTRAINT is 10
Just FYI
Ok, the general slowness on macrocycles is a known issue. We're also tracking performance issues here: https://github.com/schrodinger/coordgenlibs/issues/39 and in Schrödinger's internal bug tracker. We have efforts underway to sidestep this, so we'll probably incorporate these as test cases. That project is somewhat long-term, though.
Your team is welcome to submit a patch if you have suggestions, of course!