rd_filters
rd_filters copied to clipboard
Mismatching pattern due to RDKit aromaticity model
I started to play with different filters and found that many compounds were rejected by some of them and started to investigate the cases. One example is Filter82_pyridinium rule ([c,n]1[c,n][c,n][c,n][c,n]n(C)1
) from Inpharmatica set.
RDKit aromatizes some compounds like in example below even with AROMATICITY_SIMPLE model. This results in matching the SMARTS pattern, what I consider a false positive result.
The question is whether it was expected that this pattern should remove all such compounds or this should be relevant only for compounds with charged nitrogen ([c,n]1[c,n][c,n][c,n][c,n][n+](C)1
)?
Or there could be another workaround? Or this is more rdkit aromaticity model issue?
from rdkit import Chem
smi = 'COC1=C2N(C)C(=O)C3=C(OC(C)(C)C=C3)C2=CC=C1'
m = Chem.MolFromSmiles(smi, sanitize=False)
Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY)
Chem.SetAromaticity(m, Chem.AROMATICITY_SIMPLE)
print(Chem.MolToSmiles(m))
sma = '[c,n]1[c,n][c,n][c,n][c,n][n](C)1' #
pat = Chem.MolFromSmarts(sma)
print(m.GetSubstructMatch(pat))
output
COc1cccc2c3c(c(=O)n(C)c12)C=CC(C)(C)O3
(3, 16, 9, 8, 6, 4, 5)
The patterns were taken directly from ChEMBL with a few tweaks to make them work with the RDKit. One day, when I get some time, I'll do some curation. I'd be happy to accept PRs from others who can improve the pattern.