indigo
indigo copied to clipboard
Postgres cartridge - confusing similarity search results
foo=# SELECT bingo.getversion() ;
getversion
-----------------
1.7.9.0 linux64
(1 row)
foo=# SELECT version();
version
----------------------------------------------------------------------------------------------
PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.8.1-6) 4.8.1, 64-bit
(1 row)
Input data:
We get the same inchi string using smiles from both sources:
foo=# SELECT bingo.inchi('CN1C=NC2=C1C(=O)N(C)C(=O)N2C', '') = bingo.inchi('Cn1cnc2c1c(=O)n(C)c(=O)n2C', '');
?column?
----------
t
(1 row)
Exact search with 'MAS' option treats both representations as identical.
foo=# SELECT 'Cn1cnc2c1c(=O)n(C)c(=O)n2C' @ ('CN1C=NC2=C1C(=O)N(C)C(=O)N2C', 'MAS') :: bingo.exact;
?column?
----------
t
(1 row)
But when we try similarity search we get extremely low Tanimoto Coefficient.
foo=# SELECT bingo.getsimilarity('Cn1cnc2c1c(=O)n(C)c(=O)n2C', 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C', 'tanimoto');
getsimilarity
---------------
0.21875
(1 row)
I assume it is due the way of handling aromaticity:
from indigo import *
from indigo_renderer import *
indigo = Indigo()
renderer = IndigoRenderer(indigo)
indigo.setOption("render-output-format", "png")
indigo.setOption("render-image-size", 200, 250);
indigo.setOption("render-background-color", 1.0, 1.0, 1.0);
m1 = indigo.loadMolecule('CN1C=NC2=C1C(=O)N(C)C(=O)N2C')
renderer.renderToFile(m1, "caffeine_m1.png");

m2 = indigo.loadMolecule('Cn1cnc2c1c(=O)n(C)c(=O)n2C')
renderer.renderToFile(m2, "caffeine_m2.png");
