django-rdkit icon indicating copy to clipboard operation
django-rdkit copied to clipboard

MolField hassubstruct perfomance for complex molecules

Open ivannnnnnnnnn opened this issue 2 years ago • 0 comments

Hello! Sorry for the possibly off topic question and my English) But maybe django-rdkit community help me with my trouble

I am using django-rdkit for store mol objects. In my database I have 10 millions molecules. I am having trouble with this amount of data when try to select molecules which is substructure of target molecule if target molecule is complex.

For example if I need select molecules when hassubstruct= c1ccccc1 its work fast. But when I try to select molecules with hassubstruct= COc1cccnc1C1=CCN(C(=O)OC(C)(C)C)CC1 I am gave very slow query.

Maybe someone have same troubles and have recommendations how to up performance.

And next one questions is which algoritm rdkit catridge use for this (hassubstruct (@>)) operation. Maybe someone know any articls about this, or can explain. I'm asking because I think there might be ways to optimize search speed with data mining. For example, I do not use exact lookup to accurately search for a molecule, but instead I store smiles in a separate field in the same model and search for them. Perhaps it will also be possible to simplify the search for substructures.

ivannnnnnnnnn avatar Aug 31 '22 16:08 ivannnnnnnnnn