timemachine
timemachine copied to clipboard
Make charge symmetrization stereo-aware
- Disables passing
symmetrize=True
to the AM1 routine (since this appears to average over stereo-oblivious atom classes, and occasionally introduces very small asymmetries in cases like benzene) - Symmetrizes by averaging over RDKit stereo-sensitive atom classes
- Tests on benzene (where 2 distinct charges are expected, but 3 were produced)
- Tests on a case where symmetry is broken by stereo configuration (where
n_atoms
distinct charges are expected, but ~n_atoms/2
were produced)
Initial PR description:
symmetrize=True
flag was already being passed to toolkit functions but there were still small asymmetries in some cases (see e.g. https://github.com/proteneer/timemachine/pull/737#issuecomment-1122880117 )
Issue magnitude: very very small. (In 82 of 642 FreeSolv molecules, the partial charges within a symmetry class varied up to ~0.0002 e. However, it's still nice for len(set(assigned_charges))
for benzene to be 2 rather than 3.)
To clarify on symmetry perception under stereochemistry, consider this molecule:
F\C=C\C=C\C=C/F
Sorry, not sure what the correct behavior is for that molecule. Should the symmetry classes (or charges) be asymmetric within this molecule?
Numbered by symmetry class
Enumerating all configurations around the double bonds
F/C=C/C=C/C=C/F
F/C=C/C=C/C=C\F
F/C=C/C=C\C=C/F
F/C=C/C=C\C=C\F
F/C=C\C=C/C=C/F
F/C=C\C=C/C=C\F
F/C=C\C=C\C=C/F
F/C=C\C=C\C=C\F
F\C=C/C=C/C=C/F
F\C=C/C=C/C=C\F
F\C=C/C=C\C=C/F
F\C=C/C=C\C=C\F
F\C=C\C=C/C=C/F
F\C=C\C=C/C=C\F
F\C=C\C=C\C=C/F
F\C=C\C=C\C=C\F
The symmetry classes assigned by this method always appear the same
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
The charges currently assigned (without applying this extra symmetrize step) vary by stereoisomer, but are already exactly or nearly constant within each symmetry class.
Charges assigned to each stereoisomer
[-2.081, 0.824, -2.241, -1.359, -1.359, -2.241, 0.824, -2.081, 1.696, 1.673, 1.487, 1.487, 1.673, 1.696]
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
[-2.093, 0.804, -2.245, -1.346, -1.346, -2.245, 0.804, -2.092, 1.697, 1.649, 1.533, 1.533, 1.649, 1.697]
[-2.083, 0.818, -2.25, -1.333, -1.333, -2.25, 0.818, -2.082, 1.696, 1.689, 1.462, 1.462, 1.689, 1.696]
[-2.087, 0.801, -2.228, -1.355, -1.355, -2.228, 0.801, -2.086, 1.694, 1.654, 1.521, 1.521, 1.654, 1.694]
[-2.088, 0.788, -2.229, -1.376, -1.376, -2.229, 0.788, -2.087, 1.681, 1.623, 1.601, 1.601, 1.623, 1.681]
[-2.104, 0.757, -2.194, -1.419, -1.419, -2.194, 0.757, -2.103, 1.694, 1.693, 1.572, 1.572, 1.693, 1.694]
[-2.091, 0.789, -2.221, -1.389, -1.389, -2.221, 0.789, -2.09, 1.691, 1.687, 1.533, 1.533, 1.687, 1.691]
[-2.091, 0.789, -2.221, -1.389, -1.389, -2.221, 0.789, -2.09, 1.691, 1.687, 1.533, 1.533, 1.687, 1.691]
[-2.104, 0.757, -2.194, -1.419, -1.419, -2.194, 0.757, -2.103, 1.694, 1.693, 1.572, 1.572, 1.693, 1.694]
[-2.088, 0.788, -2.229, -1.376, -1.376, -2.229, 0.788, -2.087, 1.681, 1.623, 1.601, 1.601, 1.623, 1.681]
[-2.087, 0.801, -2.228, -1.355, -1.355, -2.228, 0.801, -2.086, 1.694, 1.654, 1.521, 1.521, 1.654, 1.694]
[-2.083, 0.818, -2.25, -1.333, -1.333, -2.25, 0.818, -2.082, 1.696, 1.689, 1.462, 1.462, 1.689, 1.696]
[-2.093, 0.804, -2.245, -1.346, -1.346, -2.245, 0.804, -2.092, 1.697, 1.649, 1.533, 1.533, 1.649, 1.697]
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
[-2.081, 0.824, -2.241, -1.359, -1.359, -2.241, 0.824, -2.081, 1.696, 1.673, 1.487, 1.487, 1.673, 1.696]
Deviation within each symmetry class for each stereoisomer
(max_c q_c - min_c q_c)
for c in sym_classes
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00047
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00047
0 0 0 0 0 0 0.00071
The effect of the change proposed in this PR on this molecule would be to eliminate the ~ 0.0001 e
asymmetry in the charge assigned to the 2 fluorines.
The two fluorines should not be in the same symmetry class (for the stereoisomer depicted in the diagram). Considering all conformations that satisfy the depicted stereo, running AM1 calculations should've resulted in different charges and should not be symmetrized. Although the diff is very small [-2.082 vs -2.081] shown below, they should be different!
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
I suspect it may be because AM1 is not sensitive enough on the raw charges for the example I provided, maybe try this one, where the environmental differences around the Fs are more drastic:
O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl

Ahh, thanks for the more dramatic example!
Enumerating all configurations around the double-bonds in the molecule O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl
, again the picture is similar:
- Proposed symmetry classes are stereo invariant (same assignment of 8 classes to 16 atoms for all stereoisomers)
- Currently assigned charges are stereo sensitive (differences up to ~0.03 e across stereoisomers)
- Currently assigned charges within each symmetry class within each stereoisomer are exactly or nearly constant (exactly constant for 7 classes, exception: difference of ~0.00003 e for the class containing oxygen)
If we want the charges to be stereo asymmetric, I wonder if we should modify these lines https://github.com/proteneer/timemachine/blob/451803e01afe6231147a0e6a3ca019d4aa5069d8/timemachine/ff/handlers/nonbonded.py#L89-L91
Toggling symmetrize=True
to symmetrize=False
in AM1ELF10 https://github.com/proteneer/timemachine/blob/451803e01afe6231147a0e6a3ca019d4aa5069d8/timemachine/ff/handlers/nonbonded.py#L89-L91 causes large differences of up to 0.1 e within each graph symmetry class within each stereoisomer of O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl
(as should be expected).
Based on offline discussion with @proteneer -- looked into using the RDKit alternative https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html#rdkit.Chem.rdmolfiles.CanonicalRankAtoms . This appears to have the desired behavior!
The PR now:
- Disables passing
symmetrize=True
to the AM1 routine (since this appears to average over stereo-oblivious atom classes) - Symmetrizes the result by averaging over RDKit stereo-sensitive atom classes
The number of distinct partial charges is reduced on benzene (from 3 on master to 2 on a220996), and increased on O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl
(from 9 on master to 16 on a220996)
would be good sanity check that this doesn't drastically affect FreeSolv results
Checking effect on computed charges for FreeSolv molecules (ref: master (451803e) -> proposed: most recent PR commit (9ce6cb9))
Histogram of signed charge changes per atom (note log y scale)

For ~99.2% of atoms, the effect on charge is < 0.001 e, and ~99.7% of atoms, the effect on charge is < 0.01 e.
Histogram of unsigned charge changes per molecule (max over atoms within each molecule, note log x scale)
Displaying the molecules where the difference exceeds arbitrary threshold of 0.01 e
Interesting that these all have one or more nitrate groups...
Haven't yet checked the effect of the proposed change on computed hydration free energies, and I'd be curious also to compare the assigned symmetry classes, esp. for the plotted molecules.
Can this be merged?
Can you try this (thanks @bp-kelley for the suggestion on ResonanceMolSupplier!):
from rdkit import Chem
import numpy as np
def resonance_aware_ranking(mol):
flags = Chem.ResonanceFlags() | Chem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
permutations = []
for m in Chem.ResonanceMolSupplier(mol, flags):
res = Chem.CanonicalRankAtoms(m, breakTies=False, includeChirality=True)
permutations.append(list(res))
permutations = np.array(permutations).T
# identify which group each orbit belongs to
iota = 0
kv = dict()
classes = []
for a_idx, orbit in enumerate(permutations):
canon_orbit = tuple(sorted(orbit))
if canon_orbit not in kv:
kv[canon_orbit] = iota
iota += 1
classes.append(kv[canon_orbit])
return classes
print(resonance_aware_rank(Chem.AddHs(Chem.MolFromSmiles("[O-]C(=O)C"))))
print(resonance_aware_rank(Chem.AddHs(Chem.MolFromSmiles("CO[N+]([O-])=O"))))
Thanks @proteneer and @bp-kelley -- will try this out!
(Just want to strengthen the test to assert that the difference between old and new symmetrization behavior is indeed limited to stereo-awareness.)
Stale -- migrated to https://github.com/proteneer/timemachine/issues/817