rdkit icon indicating copy to clipboard operation
rdkit copied to clipboard

Open sources Schrodinger's implementation of "molhash"

Open d-b-w opened this issue 2 years ago • 0 comments

Generate a unique hash code for a molecule based on chemistry. If two molecules are chemically "the same", they should have the same hash.

Used by Schrödinger's LiveDesign to determine if two molecules are the same. LiveDesign makes changes to the molecule before molhash, somewhat equivalent the steps available in rdkit.Chem.MolStandardize.

Using molhash adds value beyond using SMILES because it:

  • Ignores CXSMILES features that are not chemically meaningful (e.g. atom map numbers and coordinates)
  • Canonicalizes enhanced stereochemistry groups. For example C[C@H](O)CC |&1:1| and C[C@@H](O)CC |&1:1| have the same molhash
  • Canonicalizes S group data (for example, polymer data)

There are two hash schemes, the default, and one in which tautomers are considered equivalent.

Schrödinger has been using this for a year or two, so this is really the work of many people:

Co-authored-by: Chris Von Bargen [email protected] Co-authored-by: Greg Landrum [email protected] Co-authored-by: Hussein Faara [email protected] Co-authored-by: Rachel Walker [email protected] Co-authored-by: Ric [email protected]

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

d-b-w avatar Jun 08 '22 23:06 d-b-w