timemachine icon indicating copy to clipboard operation
timemachine copied to clipboard

Make charge symmetrization stereo-aware

Open maxentile opened this issue 2 years ago • 10 comments

  • Disables passing symmetrize=True to the AM1 routine (since this appears to average over stereo-oblivious atom classes, and occasionally introduces very small asymmetries in cases like benzene)
  • Symmetrizes by averaging over RDKit stereo-sensitive atom classes
  • Tests on benzene (where 2 distinct charges are expected, but 3 were produced)
  • Tests on a case where symmetry is broken by stereo configuration (where n_atoms distinct charges are expected, but ~n_atoms/2 were produced)

Initial PR description:

symmetrize=True flag was already being passed to toolkit functions but there were still small asymmetries in some cases (see e.g. https://github.com/proteneer/timemachine/pull/737#issuecomment-1122880117 )

Issue magnitude: very very small. (In 82 of 642 FreeSolv molecules, the partial charges within a symmetry class varied up to ~0.0002 e. However, it's still nice for len(set(assigned_charges)) for benzene to be 2 rather than 3.)

maxentile avatar May 10 '22 22:05 maxentile

To clarify on symmetry perception under stereochemistry, consider this molecule:

F\C=C\C=C\C=C/F Screen Shot 2022-05-11 at 4 46 47 PM

proteneer avatar May 11 '22 20:05 proteneer

Sorry, not sure what the correct behavior is for that molecule. Should the symmetry classes (or charges) be asymmetric within this molecule?

Numbered by symmetry class image

Enumerating all configurations around the double bonds

F/C=C/C=C/C=C/F
F/C=C/C=C/C=C\F
F/C=C/C=C\C=C/F
F/C=C/C=C\C=C\F
F/C=C\C=C/C=C/F
F/C=C\C=C/C=C\F
F/C=C\C=C\C=C/F
F/C=C\C=C\C=C\F
F\C=C/C=C/C=C/F
F\C=C/C=C/C=C\F
F\C=C/C=C\C=C/F
F\C=C/C=C\C=C\F
F\C=C\C=C/C=C/F
F\C=C\C=C/C=C\F
F\C=C\C=C\C=C/F
F\C=C\C=C\C=C\F

The symmetry classes assigned by this method always appear the same

[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]
[6 5 4 3 3 4 5 6 2 1 0 0 1 2]

The charges currently assigned (without applying this extra symmetrize step) vary by stereoisomer, but are already exactly or nearly constant within each symmetry class.

Charges assigned to each stereoisomer

[-2.081, 0.824, -2.241, -1.359, -1.359, -2.241, 0.824, -2.081, 1.696, 1.673, 1.487, 1.487, 1.673, 1.696]
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
[-2.093, 0.804, -2.245, -1.346, -1.346, -2.245, 0.804, -2.092, 1.697, 1.649, 1.533, 1.533, 1.649, 1.697]
[-2.083, 0.818, -2.25, -1.333, -1.333, -2.25, 0.818, -2.082, 1.696, 1.689, 1.462, 1.462, 1.689, 1.696]
[-2.087, 0.801, -2.228, -1.355, -1.355, -2.228, 0.801, -2.086, 1.694, 1.654, 1.521, 1.521, 1.654, 1.694]
[-2.088, 0.788, -2.229, -1.376, -1.376, -2.229, 0.788, -2.087, 1.681, 1.623, 1.601, 1.601, 1.623, 1.681]
[-2.104, 0.757, -2.194, -1.419, -1.419, -2.194, 0.757, -2.103, 1.694, 1.693, 1.572, 1.572, 1.693, 1.694]
[-2.091, 0.789, -2.221, -1.389, -1.389, -2.221, 0.789, -2.09, 1.691, 1.687, 1.533, 1.533, 1.687, 1.691]
[-2.091, 0.789, -2.221, -1.389, -1.389, -2.221, 0.789, -2.09, 1.691, 1.687, 1.533, 1.533, 1.687, 1.691]
[-2.104, 0.757, -2.194, -1.419, -1.419, -2.194, 0.757, -2.103, 1.694, 1.693, 1.572, 1.572, 1.693, 1.694]
[-2.088, 0.788, -2.229, -1.376, -1.376, -2.229, 0.788, -2.087, 1.681, 1.623, 1.601, 1.601, 1.623, 1.681]
[-2.087, 0.801, -2.228, -1.355, -1.355, -2.228, 0.801, -2.086, 1.694, 1.654, 1.521, 1.521, 1.654, 1.694]
[-2.083, 0.818, -2.25, -1.333, -1.333, -2.25, 0.818, -2.082, 1.696, 1.689, 1.462, 1.462, 1.689, 1.696]
[-2.093, 0.804, -2.245, -1.346, -1.346, -2.245, 0.804, -2.092, 1.697, 1.649, 1.533, 1.533, 1.649, 1.697]
[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]
[-2.081, 0.824, -2.241, -1.359, -1.359, -2.241, 0.824, -2.081, 1.696, 1.673, 1.487, 1.487, 1.673, 1.696]

Deviation within each symmetry class for each stereoisomer

(max_c q_c - min_c q_c)
for c in sym_classes

0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00047
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00094
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00071
0 0 0 0 0 0 0.00047
0 0 0 0 0 0 0.00071

The effect of the change proposed in this PR on this molecule would be to eliminate the ~ 0.0001 e asymmetry in the charge assigned to the 2 fluorines.

maxentile avatar May 13 '22 14:05 maxentile

The two fluorines should not be in the same symmetry class (for the stereoisomer depicted in the diagram). Considering all conformations that satisfy the depicted stereo, running AM1 calculations should've resulted in different charges and should not be symmetrized. Although the diff is very small [-2.082 vs -2.081] shown below, they should be different!

[-2.082, 0.794, -2.22, -1.398, -1.398, -2.22, 0.794, -2.081, 1.686, 1.69, 1.53, 1.53, 1.69, 1.686]

I suspect it may be because AM1 is not sensitive enough on the raw charges for the example I provided, maybe try this one, where the environmental differences around the Fs are more drastic:

O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl

Screen Shot 2022-05-13 at 11 27 42 AM

proteneer avatar May 13 '22 15:05 proteneer

Ahh, thanks for the more dramatic example!

Enumerating all configurations around the double-bonds in the molecule O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl, again the picture is similar: image

  • Proposed symmetry classes are stereo invariant (same assignment of 8 classes to 16 atoms for all stereoisomers)
  • Currently assigned charges are stereo sensitive (differences up to ~0.03 e across stereoisomers)
  • Currently assigned charges within each symmetry class within each stereoisomer are exactly or nearly constant (exactly constant for 7 classes, exception: difference of ~0.00003 e for the class containing oxygen)

If we want the charges to be stereo asymmetric, I wonder if we should modify these lines https://github.com/proteneer/timemachine/blob/451803e01afe6231147a0e6a3ca019d4aa5069d8/timemachine/ff/handlers/nonbonded.py#L89-L91

maxentile avatar May 13 '22 16:05 maxentile

Toggling symmetrize=True to symmetrize=False in AM1ELF10 https://github.com/proteneer/timemachine/blob/451803e01afe6231147a0e6a3ca019d4aa5069d8/timemachine/ff/handlers/nonbonded.py#L89-L91 causes large differences of up to 0.1 e within each graph symmetry class within each stereoisomer of O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl (as should be expected).

maxentile avatar May 13 '22 17:05 maxentile

Based on offline discussion with @proteneer -- looked into using the RDKit alternative https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html#rdkit.Chem.rdmolfiles.CanonicalRankAtoms . This appears to have the desired behavior!

The PR now:

  • Disables passing symmetrize=True to the AM1 routine (since this appears to average over stereo-oblivious atom classes)
  • Symmetrizes the result by averaging over RDKit stereo-sensitive atom classes

The number of distinct partial charges is reduced on benzene (from 3 on master to 2 on a220996), and increased on O/C(=C(/O)\C(\Cl)=C(\F)Cl)/C(/Cl)=C(\F)Cl (from 9 on master to 16 on a220996)

maxentile avatar May 13 '22 21:05 maxentile

would be good sanity check that this doesn't drastically affect FreeSolv results

Checking effect on computed charges for FreeSolv molecules (ref: master (451803e) -> proposed: most recent PR commit (9ce6cb9))

Histogram of signed charge changes per atom (note log y scale)

image

For ~99.2% of atoms, the effect on charge is < 0.001 e, and ~99.7% of atoms, the effect on charge is < 0.01 e.

Histogram of unsigned charge changes per molecule (max over atoms within each molecule, note log x scale) image

Displaying the molecules where the difference exceeds arbitrary threshold of 0.01 e image

Interesting that these all have one or more nitrate groups...

Haven't yet checked the effect of the proposed change on computed hydration free energies, and I'd be curious also to compare the assigned symmetry classes, esp. for the plotted molecules.

maxentile avatar May 16 '22 21:05 maxentile

Can this be merged?

proteneer avatar May 18 '22 21:05 proteneer

Can you try this (thanks @bp-kelley for the suggestion on ResonanceMolSupplier!):

from rdkit import Chem
import numpy as np

def resonance_aware_ranking(mol):

    flags = Chem.ResonanceFlags() | Chem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
    permutations = []

    for m in Chem.ResonanceMolSupplier(mol, flags):
        res = Chem.CanonicalRankAtoms(m, breakTies=False, includeChirality=True)
        permutations.append(list(res))

    permutations = np.array(permutations).T

    # identify which group each orbit belongs to
    iota = 0
    kv = dict()
    classes = []
    for a_idx, orbit in enumerate(permutations):
        canon_orbit = tuple(sorted(orbit))
        if canon_orbit not in kv:
            kv[canon_orbit] = iota
            iota += 1

        classes.append(kv[canon_orbit])

    return classes
        
print(resonance_aware_rank(Chem.AddHs(Chem.MolFromSmiles("[O-]C(=O)C"))))
print(resonance_aware_rank(Chem.AddHs(Chem.MolFromSmiles("CO[N+]([O-])=O"))))

proteneer avatar May 19 '22 23:05 proteneer

Thanks @proteneer and @bp-kelley -- will try this out!

(Just want to strengthen the test to assert that the difference between old and new symmetrization behavior is indeed limited to stereo-awareness.)

maxentile avatar May 20 '22 12:05 maxentile

Stale -- migrated to https://github.com/proteneer/timemachine/issues/817

maxentile avatar Sep 07 '22 17:09 maxentile