RDMC icon indicating copy to clipboard operation
RDMC copied to clipboard

Add protonation functions

Open jonwzheng opened this issue 8 months ago • 4 comments

Motivation or Problem

This PR adds several helper functions related to protonation & ionization.

Description of Changes

New "big" functions:

  • uncharge_mol(mol, method): Input = charged molecule (ion or zwitterion), output = uncharged form. Provides two algorithms for doing uncharging, default is to try both in case the other fails, starting with the rdkit algorithm.
  • is_symmetric_to_substructure(mol, substructure): Check whether a mol is symmetric to a provided substructure, i.e. return "True" for comparing ethylene glycol to "OH" substructure

Helper functions:

  • protonate_at_site(mol, site): Add a proton to a mol at a given idx and adjust formal charges
  • deprotonate_at_site(mol, site): Remove a proton of a mol at a given idx and adjust formal charges
  • is_implicit(mol) : Infer whether a molecule is an implicit or explicit mol object
  • find_symmetry_classes(mol): provides a set of symmetry classes for atoms in a mol object, based on code by Greg Landrum.

Testing

I included pytest modules for uncharge_mol and is_symmetric_to_substructure

Other notes

The two uncharging methods have different behaviors regarding explicit hydrogens.

Chem.MolToSmiles(uncharge_mol(mol_from_smiles("[C:1]([C:2]([C:3]([C:4](=[O:5])[O-:6])([H:12])[H:13])([H:10])[H:11])([H:7])([H:8])[H:9]"), method="rdkit"))
>> '[CH3:1][CH2:2][CH2:3][C:4](=[O:5])[OH:6]'

vs.

Chem.MolToSmiles(uncharge_mol(mol_from_smiles("[C:1]([C:2]([C:3]([C:4](=[O:5])[O-:6])([H:12])[H:13])([H:10])[H:11])([H:7])([H:8])[H:9]"), method="nocharge"))
>> '[H][O:6][C:4]([C:3]([C:2]([C:1]([H:7])([H:8])[H:9])([H:10])[H:11])([H:12])[H:13])=[O:5]'

Is the desired behavior to re-number?

They still return the same smiles if you use mol_to_smiles though.

jonwzheng avatar Jun 04 '24 17:06 jonwzheng