knime-rdkit icon indicating copy to clipboard operation
knime-rdkit copied to clipboard

RDKit Rooted Fingerprints Have Wrong Results in KNIME

Open manuelschwarze opened this issue 1 year ago • 0 comments

We have discovered that rooted fingerprints in KNIME have are different from Python and pure Java. The reason we found is that there is a bug how the atom list for the rooting feature is created before it passed to the RDKit native code.

Paolo Tosco's Java code created it like this, as example with atom index 17:

UInt_Vect atomList = new UInt_Vect(1);
atomList.set(0, atom);

This is correct. It results in [ 17 ]

I had created it many years ago like this:

atomList = new UInt_Vect(1);
atomList.add(atom);

This is incorrect. It results in [ 0, 17 ], where "0" could also be something that is not well-defined.

In addition to this code, other buggy code also related to the same feature was discovered in InputDataObject class, which processes again such UInt_Vect objects and has the same bug.

Consequences of this bug:

InputDataInfo.getRDKitIntegerVector(DataRow row) – Always was double the size that it should be, and the first half was filled with 0 (or undefined).

RDKit Highlighting Node – Should have been leading to always coloring of atom at index 0, but for some reason this did not happen, hence no problem that a user would actually see RDKit Highlighting Atoms Node (deprecated) – Influence on SVG creation, but no problem that a user would actually see InputDataInfo.getRDKitUIntegerVector (DataRow row) – Always was double the size that it should be, and the first half was filled with 0 (or undefined).

RDKit Fingerprint Nodes – Rooted fingerprints (Morgan, FeatMorgan, AtomPair, Torsion, RDKit, Layered, always included the atom indexes 0 (or undefined) AbstractFingerprintNodeModel.$AbstractRDKitCellFactory.process(…) – When a single atom was selected for rooting, it was always using atom at index 0 and the selected one, so it used 2 atoms instead of a single one, every fingerprint calculated with rooted atoms has in the moment wrong results: There are more bits set in the fingerprint than it would have without the additional badly added atom index.

Tracked at Novartis as KNIME-1794

manuelschwarze avatar Dec 01 '23 22:12 manuelschwarze