smarty
smarty copied to clipboard
Add fractional bond orders to SMIRFF
Eventually, we need to add capacity to SMIRFF for fractional bond orders, which we will obtain from a semi-empirical QM calculation. This will be used downstream eventually (after this summer) for parameter interpolation.
Suggested FFXML format:
<HarmonicBondForce length_unit="angstroms" k_unit="kilocalories_per_mole/angstrom**2" fractional_bond_order="interpolate-linear">
<Bond smirks="[#6:1]-[#6:2]" length1="1.526" k1="620.0" length2="1.426" k2="820.0" length3="1.326" k3="1020.0"/>
...
</HarmonicBondForce>
So, an interesting question is whether we would allow fractional bond orders only for SMIRKS matching ~
bond types or whether we would allow them for any. Presumably it would be for any, I suppose, as you have here. Otherwise the concern would be that the semiempirical calculation would give us a fractional bond order but the previous SMIRKS would prevent us from using it... Is that what you're thinking?
@cbayly13 , does this look suitable?
We want to exclude triple bonds so the form I would suggest is: [#6:1]!#[#6:2] which is asking for a carbon-carbon bond of any order except triple, thus allowing double, single, and aromatic. Most useful conjugation is across CC, CN, and CO bonds so I would want to start with [#6:1]!#[#6:2] [#6:1]!#[#7:2] [#6:1]~[#8:2] and see how far we can get with that. In the last case only a "~" is needed because singlet state C#O does not occur in pharmaceutically relevant molecules.
In principle triple bonds would not be ruled out, because they can conjugate too, but I am expecting a lot of finessing needed to map: single<-->conjugatedSingle<-->aromatic<-->conjugatedDouble<-->double correctly, which is where all atom-typed ligand force fields fall down.
@jchodera - for reasons I can explain in more detail when I see you in person later this week, we may now want to do this sooner rather than later (in brief, it has to do with trying to put parm@frosst in SMIRFF format, which will be MUCH MUCH easier for specific types of systems if we have this).
The format above looks reasonable to me for bonds but may require notation adjustments. Specifically, for torsions, we use k1
, k2
, k3
, (periodicity1
, periodicity2
,...) etc. to pick up multiple torsion terms which apply to the same SMIRKS. This probably means the choice of k1
, k2
, k3
for bonds is less than ideal since the numbering would mean something different there than it does for torsions.
Would you be happy with k_bondorder1
, k_bondorder2
, etc. instead?
This would also mean that for torsions, we could accommodate multiple torsion parameters that apply to different bond orders (whereas, as currently proposed, you'd have a name clash between k1
for the first torsional term and k1
for the term applying to bond order 1). For example, right now we have
<PeriodicTorsionForce phase_unit="degrees" k_unit="kilocalories_per_mole">
<Proper smirks="[#6X4:1]-[#6X4:2]-[#8X2:3]-[#1:4]" idivf1='1' periodicity1="3" phase1="0.0" k1="0.16" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25" id="t0006" parent_id="t0002"/> <!-- HO-OH-CT-CT from frcmod.Frosst_AlkEthOH -->
...
and this could extend to
<PeriodicTorsionForce phase_unit="degrees" k_unit="kilocalories_per_mole" fractional_bondorder="interpolate_linear">
<Proper smirks="[#6X4:1]-[#6X4:2]-[#8X2:3]-[#1:4]" idivf1='1' periodicity1="3" phase1="0.0" k1_bondorder1="0.16" k1_bondorder2="0.25" idivf2="1" periodicity2="1" phase2="0.0" k2_bondorder1="0.25" "k2_bondorder2"="0.0" id="t0006" parent_id="t0002"/> <!-- HO-OH-CT-CT from frcmod.Frosst_AlkEthOH -->
(Note that I made up the k1_bondorder2
and k2_bondorder2
values out of thin air for illustration purposes.)
If this looks reasonable, I expect I can actually implement this myself without significant difficulty so we can try it out.
Other than changing "order" to "bondorder" for clarity, this proposal sounds fine!
How are we going to evaluate the bond orders, though? Are we planning to take them from the OEMol for now, or should we add bond order to the Topology at this point?
How are we going to evaluate the bond orders, though? Are we planning to take them from the OEMol for now, or should we add bond order to the Topology at this point?
Well, we'll get the bond orders from an OEMol. But, right now createForce
only works with the Topology
so I think this means that either:
- we need to update the
Topology
to include bond order, or -
createForce
will need major updates/re-architecting to be able to pull the bond orders from the stored reference molecules in theTopology
class
It seems like the first would be much better and doesn't look like it should be much work -- we just need self._bondorders
added to Topology
to track the order of each bond in self._bonds
, it seems. I'd update generateTopologyFromOEMol
to store this info. Then this would be used in createForce
to pick which parameters to use when actually assigning from the FFXML.
Sound reasonable?
Unrelatedly, I updated the above API proposal in my post to use your terminology of "bondorder" rather than "order" so I'd put this in in the right way.
I would just add to the _Topology
class we use inside of forcefield.py
for now. We already use this "augmented" topology instead of the OpenMM Topology
as the argument to createForce
, so this is the natural place to extend things for now until we can get them into OpenMM.
We do have an open OpenMM issue on bond orders that you should probably comment in too.
Also, let's standardize fractional_bond_order
to fractional_bondorder
too.
Agreed on both of those, @jchodera .
One other issue I forgot to bring up earlier, @jchodera - we think we need to extend the SMIRFF format slightly to add the option to kekulize molecules before assigning parameters (default False, as at present).
We're thinking of putting parm@frosst into SMIRFF format sooner rather than later, and two things will make it take a lot less human time: first, kekulizing molecules before assigning parameters, and second, having bond orders (this has to do with some issues relating to how to handle aromatic molecules that still have strong single/double character and ALSO handle aromatic molecules which do not). For Smarty/Smirky/etc., we're of course not kekulizing, but this will help Christopher in putting parm@frosst into this format.
What about adding this as a tag in the parameter file, i.e. something like
<SMIRFF kekulize="True">
...
</SMIRFF>
where if kekulize
is False or unspecified, the current behavior would be retained?
Setting global parameters like kekulize
in the <SMIRFF>
tag sounds fine. This could instruct ForceField
to copy and kekulize all of the molecules it is asked to parameterize.
I should probably also deal with #42 at the same time and put in a version attribute.
For the record, OpenEye support reports we can retrieve partial (Wiberg) bond orders from i.e. AM1-BCC calculations done with their tools, such as via this code snippet:
mol = OEGraphMol()
ifs = oemolistream(argv[1])
while OEReadMolecule(ifs,mol):
if OEAssignPartialCharges(mol, OECharges_AM1BCCSym):
for bond in mol.GetBonds():
print (bond.GetData("WibergBondOrder"))
This may or may not be supported in future releases (they're putting in a feature request to retain it and I'll update with the outcome of that) but it's there for now and we can use it for this.
Are you going to add the support for this to forcefield.py
, @davidlmobley?
Yes, @jchodera . Could be I'll get to work on this while traveling back on Friday, though more likely it'll be early next week. Since I can see how to do it easily enough, your time will be better spent on the property calculation things.
@jchodera - any thoughts on whether we should REQUIRE floating point bond orders, or fall back to something else (i.e. bond orders as integers if the user provides an OEMol and doesn't want to do a charging calculation) if the user insists on not doing an AM1 calculation?
I'm thinking the simplest and probably best behaved might be to populate the bond orders using bond.GetOrder()
for regular OEMols if there are no wiberg bond orders (no AM1 calculation has been done), otherwise populate with the Wiberg bond orders. Probably I should have an optional argument to the charging calculation which would override this so we can have it retain integer orders when we want it to.
Thanks.