coordgenlibs icon indicating copy to clipboard operation
coordgenlibs copied to clipboard

Really slow performance with some macrocycles

Open bjonnh-work opened this issue 2 years ago • 8 comments

We found an issue when using coordgen with rdkit. When activating coordgen, it is extremely slow to work with such chiral macrocycles. Removing the chirality from the atoms makes it fast again.

(roughly 100 times slower for the given molecule).

from rdkit.Chem import rdDepictor
from rdkit import Chem

### This is slow
rdDepictor.SetPreferCoordGen(True)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))

### This is fast
rdDepictor.SetPreferCoordGen(False)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))

This is a crossposted issue with rdkit https://github.com/rdkit/rdkit/issues/5813

bjonnh-work avatar Nov 29 '22 21:11 bjonnh-work

Thanks for the report! this is very interesting

d-b-w avatar Nov 29 '22 22:11 d-b-w

I'm still trying to figure out the impact of chirality, it seems that in some cases it may be even slower without chirality.

bjonnh-work avatar Nov 29 '22 22:11 bjonnh-work

Ran some better benchmarks, for that molecule at least, it seems the chirality has no impact.

bjonnh-work avatar Nov 29 '22 22:11 bjonnh-work

However, the more flexible it is the more time it takes. This takes twice the time as the previous molecule: CC1CCCCCCCCCOCCNCCCCCCCCCCOCCNC(C)CCCCCCCCC(=O)OCCN1

bjonnh-work avatar Nov 29 '22 23:11 bjonnh-work

CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2m on my machine CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2s

I think I'll throw a profiler at that

bjonnh-work avatar Nov 29 '22 23:11 bjonnh-work

Minimal example that doesn't require rdkit

BOOST_AUTO_TEST_CASE(SlowMacrocycle)
{
    auto mol = "CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1"_smiles;
    BOOST_TEST(mol->getBonds()[0]->getBondOrder() == 1);
    sketcherMinimizer minimizer;
    minimizer.initialize(mol.get());
    minimizer.runGenerateCoordinates();
    const auto& atoms = minimizer.getAtoms();
    sketcherMinimizerAtom* center = atoms.at(0);
    BOOST_REQUIRE_EQUAL(center->getAtomicNumber(), 6);
}

bjonnh-work avatar Nov 29 '22 23:11 bjonnh-work

The slowdown occurs in file: .../coordgen/CoordgenMacrocycleBuilder.cpp At line 682: if (checkedMacrocycles > MAX_MACROCYCLES) { break; } MAX_MACROCYCLES is set to 40, and it takes a long time to get there for the bad mol. OR, the acceptableScore calculated and checked just above that could be the issue.

  It is calculated as: numberOfAtoms * SUBSTITUTED_ATOM_RESTRAINT / 2
  and   SUBSTITUTED_ATOM_RESTRAINT is 10
  
  Just FYI

tadhurst-cdd avatar Nov 30 '22 17:11 tadhurst-cdd

Ok, the general slowness on macrocycles is a known issue. We're also tracking performance issues here: https://github.com/schrodinger/coordgenlibs/issues/39 and in Schrödinger's internal bug tracker. We have efforts underway to sidestep this, so we'll probably incorporate these as test cases. That project is somewhat long-term, though.

Your team is welcome to submit a patch if you have suggestions, of course!

d-b-w avatar Dec 02 '22 18:12 d-b-w