openff-toolkit
openff-toolkit copied to clipboard
Feature Request: Make ToolkitAM1BCCHandler provide more consistent charges
Is your feature request related to a problem? Please describe.
We have a whole heap of unresolved problems in our AM1 methods, many of which I think that we can handle in one fell swoop:
- Protons can migrate and end up giving us partial charges for different chemical species than were input (#924)
- The generated charges are dependent on the input conformer, and the autogenerated input conformers are affected by atom order, toolkit backend, and other factors (#926)
- We should use ELF10 if OE is available (#983)
- Detailed behavior and exceptions aren't documented (#447)
While there are proposed fixes for many of these, behavior changes to our AM1 implementation are particularly sticky, since they touch process boundaries in the code, organizational/packaging boundaries in our ecosystem, and the AM1 methods in our API underlie the behavior of the ToolkitAM1BCC
tag in our FFs, meaning that implementation changes in the code directly cause changes in FF behavior.
Regardless, I think that resolving some of the deficiencies above would be worth some amount of messiness/changes.
Describe the solution you'd like
- The
ToolkitAM1BCCHandler
should try to assign ELF10 charges if OE is available, but just single-conformer AM1 charges if only AmberTools is available (for runtime reasons) - AmberTools
assign_partial_charges
with the AM1BCC method should default to using an "ELF1" conformer (This should make conformer generation a bit more deterministic; see preliminary results at bottom of meeting notes here) - The output geometry from the AmberTools wrapper (
antechamber
/sqm
) should be checked to see whether any bonds have broken/been formed. If so, thensqm
should be rerun on the input, but given themaxcyc=0
keyword, so that no minimization is performed. This will still give sub-optimal results, but they'll be better than if a proton migration happened.- Connectivity rearrangements can/should be checked using QCElemental's
guess_connectivity
method. Only the existence of bonds should be considered, not the order of those bonds.
- Connectivity rearrangements can/should be checked using QCElemental's
- Ensure that this doesn't break anything with
ChargeIncrementModelHandler
, which has an explicit attribute about how many conformers to use.
The ToolkitAM1BCCHandler should try to assign ELF10 charges if OE is available, but just single-conformer AM1 charges if only AmberTools is available (for runtime reasons) AmberTools assign_partial_charges with the AM1BCC method should default to using an "ELF1" conformer (This should make conformer generation a bit more deterministic; see preliminary results at bottom of meeting notes here)
I don't know where other people fall on the charge "quality" vs consistency balance. In theory the distance between charges from RandomConformer1 and RandomConformer2 should generally be higher, than the distance between charges from RandomConformer1 and AveragedConformerCharges. However, it feels weird to encode this toolkit-dependent inconsistency in the method.
I recall Conor doing some work on selecting consistent conformers using both RDKit and OpenEye ELF selection methods. Would defaulting to ELF1 for OpenEye as well be a compromise solution?
Would defaulting to ELF1 for OpenEye as well be a compromise solution?
One thing to note is that, in the above feature request, both ensembles would now include something near the "ELF1" conformer, since it's going to be a member of the ELF10 ensemble. So we should already expect better-than-random agreement between the methods.
Overall I don't think we should reduce the OE implementation to ELF1 for the sake of consistency. #1170 points out that the spec says that <ToolkitAM1BCC>
SHOULD use ELF10 if possible, I would just argue that the runtime for antechamber/sqm doesn't make it "possible" for most molecules.
The generated charges are dependent on the input conformer, and the autogenerated input conformers are affected by atom order, toolkit backend, and other factors
I don't know where other people fall on the charge "quality" vs consistency balance
Just for my understanding, as of now the difference is inevitable since OE uses a restrained optimization and AT doesn't, so consistency between toolkits should not be a goal at all(?). Chris Bayly talked about checking for electrostatic collapse and not using the restraints while optimizing with OE if there is no electrostatic collapse, that would result in running it twice though
- is Conor's approach of ELF1 without any optimization a solution to this,
- if so, is there a check on whether the charges after optimization are close enough to charges without optimization?
- also, is there even a flexibility in the OE call to quacpac to turn off restraints?
Edit: Never mind, I found this in openeye documentation
optimizationSetting = False
if oequacpac.OEAssignCharges(mol, oequacpac.OEAM1Charges(optimizationSetting)):
# ...
Also, in our toolkit am1bccnosymspt
Edit 2: Ahh, this turns off optimization entirely, I don't think we have an api point in quacpac where we can optimize without restraints using openeye
Chris Bayly talked about checking for electrostatic collapse and not using the restraints while optimizing with OE if there is no electrostatic collapse, that would result in running it twice though
My experience with quacpac is that it's a somewhat tightly-sealed black box, and I don't think it's a good use of our time to try and fiddle around with the internals. So on an effort-expenditure level I have a strong preference for "just run quacpac with its default ELF10, which will include restraints".
@pavankum I won't go point-by-point through your questions since I think they are all about the details of implementing "ELF1" in OpenEye, which I'm not advocating (instead I think we should use their pre-packaged ELF10).
I think they are all about the details of implementing "ELF1" in OpenEye
Apologies for conflating different things, I was thinking more about ELF1 addressing the electrostatic collapse for certain molecules so that we don't need restraints at all for any backend toolkit. Your proposed pathway of checking connectivity and turning off optimization resolves everything, and I realized that would need work on OE's end since even if we pass single chosen conformer we don't have access to turn off their restraints.
- The output geometry from the AmberTools wrapper (
antechamber
/sqm
) should be checked to see whether any bonds have broken/been formed
In one of my scripts, this is done with the rdkit pdb reader or qcelemental which guesses bonds based on covalent radii (the qcelemental solution comes from Simon B.). Both checks seem to agree really well with each other when a chemical graph changes.
- If so, then
sqm
should be rerun on the input, but given themaxcyc=0
keyword, so that no minimization is performed.
For this, I first create a list of arguments that will later be read into antechamber. These arguments are then put into antechamber on the command line and run.. Note the use of maxcyc=0
as one of these arguments.
Also, if it ever becomes useful to call sqm, am1bcc, and atomtype separately (versus just calling everything quickly with antechamber), I have code that calls sqm, and then performs bcc and file operations later in file_op()
.