django-rdkit
django-rdkit copied to clipboard
MolField hassubstruct perfomance for complex molecules
Hello! Sorry for the possibly off topic question and my English) But maybe django-rdkit community help me with my trouble
I am using django-rdkit for store mol objects. In my database I have 10 millions molecules. I am having trouble with this amount of data when try to select molecules which is substructure of target molecule if target molecule is complex.
For example if I need select molecules when hassubstruct= c1ccccc1
its work fast. But when I try to select molecules with hassubstruct= COc1cccnc1C1=CCN(C(=O)OC(C)(C)C)CC1
I am gave very slow query.
Maybe someone have same troubles and have recommendations how to up performance.
And next one questions is which algoritm rdkit catridge use for this (hassubstruct (@>)
) operation. Maybe someone know any articls about this, or can explain. I'm asking because I think there might be ways to optimize search speed with data mining. For example, I do not use exact
lookup to accurately search for a molecule, but instead I store smiles in a separate field in the same model and search for them. Perhaps it will also be possible to simplify the search for substructures.