Isoelectric model breaks uncharging `[N+2]`
A small example:
from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize
mb = """
RDKit 2D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 7 7 0 0 0
M V30 BEGIN ATOM
M V30 1 O -0.000000 5.505314 0.000000 0
M V30 2 N -0.000000 4.008578 0.000000 0 CHG=2
M V30 3 C 0.000000 0.999538 0.000000 0
M V30 4 C -1.298419 3.248684 0.000000 0
M V30 5 C 1.298419 3.248684 0.000000 0
M V30 6 C -1.298419 1.744164 0.000000 0
M V30 7 C 1.298419 1.744164 0.000000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 4
M V30 3 1 2 5
M V30 4 1 3 6
M V30 5 1 3 7
M V30 6 1 4 6
M V30 7 1 5 7
M V30 END BOND
M V30 END CTAB
M END
$$$$
"""
m = Chem.MolFromMolBlock(mb)
m.Debug()
# The +2 charge on the N atom is completely bogus:
# Atoms:
# 0 8 O chg: 0 deg: 1 exp: 1 imp: 1 hyb: SP3
# 1 7 N chg: 2 deg: 3 exp: 3 imp: 0 hyb: SP2
# 2 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 3 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 4 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 5 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 6 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# Bonds:
# 0 0->1 order: 1
# 1 1->3 order: 1
# 2 1->4 order: 1
# 3 2->5 order: 1
# 4 2->6 order: 1
# 5 3->5 order: 1
# 6 4->6 order: 1
uncharger = rdMolStandardize.Uncharger(canonicalOrder=True)
uncharger.uncharge(m).Debug()
# The uncarger does not remove the charge:
# Atoms:
# 0 8 O chg: 0 deg: 1 exp: 1 imp: 1 hyb: SP3
# 1 7 N chg: 2 deg: 3 exp: 3 imp: 0 hyb: SP2
# 2 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 3 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 4 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 5 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# 6 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
# Bonds:
# 0 0->1 order: 1
# 1 1->3 order: 1
# 2 1->4 order: 1
# 3 2->5 order: 1
# 4 2->6 order: 1
# 5 3->5 order: 1
# 6 4->6 order: 1
If I revert the isoelectric model commit, the first Debug() shows the N atom as:
1 7 N chg: 2 deg: 3 exp: 3 imp: 2 hyb: SP3
Note the 2 implicit Hs, which can be removed by the uncharger, neutralizing the N, as shown by the second call to Debug():
Atoms:
0 8 O chg: 0 deg: 1 exp: 1 imp: 1 hyb: SP3
1 7 N chg: 0 deg: 3 exp: 3 imp: 0 hyb: SP3
2 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
3 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
4 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
5 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
6 6 C chg: 0 deg: 2 exp: 2 imp: 2 hyb: SP3
Bonds:
0 0->1 order: 1
1 1->3 order: 1
2 1->4 order: 1
3 2->5 order: 1
4 2->6 order: 1
5 3->5 order: 1
6 4->6 order: 1
As you've pointed out, the difference here is that the old valence calculation rules assigned 2 Hs to the N. Those were removed by the uncharger. The new valence calculation assigns no Hs to the N, so the molecule cannot be neutralized (this is the same as the current release does if you give it the SMILES O[N+2]1CCCCC1).
I believe that the behavior of master, where there are no implicit Hs assigned to the N, is correct. For what it's worth, marvin sketch, indigo (as tested with ketcher), and the InChI code all agree with master that the chemical formula of your molecule is C5H11NO.
I agree that the current master behavior is correct.
The molecule
RDKit 2D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 7 7 0 0 0
M V30 BEGIN ATOM
M V30 1 O -0.000000 5.505314 0.000000 0
M V30 2 N -0.000000 4.008578 0.000000 0 CHG=2
M V30 3 C 0.000000 0.999538 0.000000 0
M V30 4 C -1.298419 3.248684 0.000000 0
M V30 5 C 1.298419 3.248684 0.000000 0
M V30 6 C -1.298419 1.744164 0.000000 0
M V30 7 C 1.298419 1.744164 0.000000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 4
M V30 3 1 2 5
M V30 4 1 3 6
M V30 5 1 3 7
M V30 6 1 4 6
M V30 7 1 5 7
M V30 END BOND
M V30 END CTAB
M END
should not be added two implicit Hs as the N valence supports a maximum of one.
Therefore, the molecule above, O[N+2]1CCCCC1, is effectively a di-ionized N-hydroxycyclohexylamine, which should be left untouched by Uncharger, just as would happen for the radical, mono-ionized species O[N+1]1CCCCC1 with no implicit H on the nitrogen. The uncharger is only supposed to deal with charges arising from protonation/deprotonation.
This issue was marked as stale because it has been open for 90 days with no activity.