rdkit
rdkit copied to clipboard
Open sources Schrodinger's implementation of "molhash"
Generate a unique hash code for a molecule based on chemistry. If two molecules are chemically "the same", they should have the same hash.
Used by Schrödinger's LiveDesign to determine if two molecules are the same. LiveDesign makes changes to the molecule before molhash, somewhat equivalent the steps available in rdkit.Chem.MolStandardize.
Using molhash adds value beyond using SMILES because it:
- Ignores CXSMILES features that are not chemically meaningful (e.g. atom map numbers and coordinates)
- Canonicalizes enhanced stereochemistry groups. For example
C[C@H](O)CC |&1:1|
andC[C@@H](O)CC |&1:1|
have the same molhash - Canonicalizes S group data (for example, polymer data)
There are two hash schemes, the default, and one in which tautomers are considered equivalent.
Schrödinger has been using this for a year or two, so this is really the work of many people:
Co-authored-by: Chris Von Bargen [email protected] Co-authored-by: Greg Landrum [email protected] Co-authored-by: Hussein Faara [email protected] Co-authored-by: Rachel Walker [email protected] Co-authored-by: Ric [email protected]