openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

Feature Request: Make ToolkitAM1BCCHandler provide more consistent charges

Open j-wags opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe.

We have a whole heap of unresolved problems in our AM1 methods, many of which I think that we can handle in one fell swoop:

  • Protons can migrate and end up giving us partial charges for different chemical species than were input (#924)
  • The generated charges are dependent on the input conformer, and the autogenerated input conformers are affected by atom order, toolkit backend, and other factors (#926)
  • We should use ELF10 if OE is available (#983)
  • Detailed behavior and exceptions aren't documented (#447)

While there are proposed fixes for many of these, behavior changes to our AM1 implementation are particularly sticky, since they touch process boundaries in the code, organizational/packaging boundaries in our ecosystem, and the AM1 methods in our API underlie the behavior of the ToolkitAM1BCC tag in our FFs, meaning that implementation changes in the code directly cause changes in FF behavior.

Regardless, I think that resolving some of the deficiencies above would be worth some amount of messiness/changes.

Describe the solution you'd like

  • The ToolkitAM1BCCHandler should try to assign ELF10 charges if OE is available, but just single-conformer AM1 charges if only AmberTools is available (for runtime reasons)
  • AmberTools assign_partial_charges with the AM1BCC method should default to using an "ELF1" conformer (This should make conformer generation a bit more deterministic; see preliminary results at bottom of meeting notes here)
  • The output geometry from the AmberTools wrapper (antechamber/sqm) should be checked to see whether any bonds have broken/been formed. If so, then sqm should be rerun on the input, but given the maxcyc=0 keyword, so that no minimization is performed. This will still give sub-optimal results, but they'll be better than if a proton migration happened.
    • Connectivity rearrangements can/should be checked using QCElemental's guess_connectivity method. Only the existence of bonds should be considered, not the order of those bonds.
  • Ensure that this doesn't break anything with ChargeIncrementModelHandler, which has an explicit attribute about how many conformers to use.

j-wags avatar Jan 12 '22 01:01 j-wags

The ToolkitAM1BCCHandler should try to assign ELF10 charges if OE is available, but just single-conformer AM1 charges if only AmberTools is available (for runtime reasons) AmberTools assign_partial_charges with the AM1BCC method should default to using an "ELF1" conformer (This should make conformer generation a bit more deterministic; see preliminary results at bottom of meeting notes here)

I don't know where other people fall on the charge "quality" vs consistency balance. In theory the distance between charges from RandomConformer1 and RandomConformer2 should generally be higher, than the distance between charges from RandomConformer1 and AveragedConformerCharges. However, it feels weird to encode this toolkit-dependent inconsistency in the method.

I recall Conor doing some work on selecting consistent conformers using both RDKit and OpenEye ELF selection methods. Would defaulting to ELF1 for OpenEye as well be a compromise solution?

lilyminium avatar Jan 12 '22 02:01 lilyminium

Would defaulting to ELF1 for OpenEye as well be a compromise solution?

One thing to note is that, in the above feature request, both ensembles would now include something near the "ELF1" conformer, since it's going to be a member of the ELF10 ensemble. So we should already expect better-than-random agreement between the methods.

Overall I don't think we should reduce the OE implementation to ELF1 for the sake of consistency. #1170 points out that the spec says that <ToolkitAM1BCC> SHOULD use ELF10 if possible, I would just argue that the runtime for antechamber/sqm doesn't make it "possible" for most molecules.

j-wags avatar Jan 12 '22 16:01 j-wags

The generated charges are dependent on the input conformer, and the autogenerated input conformers are affected by atom order, toolkit backend, and other factors

I don't know where other people fall on the charge "quality" vs consistency balance

Just for my understanding, as of now the difference is inevitable since OE uses a restrained optimization and AT doesn't, so consistency between toolkits should not be a goal at all(?). Chris Bayly talked about checking for electrostatic collapse and not using the restraints while optimizing with OE if there is no electrostatic collapse, that would result in running it twice though

  1. is Conor's approach of ELF1 without any optimization a solution to this,
  2. if so, is there a check on whether the charges after optimization are close enough to charges without optimization?

pavankum avatar Jan 12 '22 17:01 pavankum

  1. also, is there even a flexibility in the OE call to quacpac to turn off restraints?

Edit: Never mind, I found this in openeye documentation

optimizationSetting = False
if oequacpac.OEAssignCharges(mol, oequacpac.OEAM1Charges(optimizationSetting)):
    # ...

Also, in our toolkit am1bccnosymspt

Edit 2: Ahh, this turns off optimization entirely, I don't think we have an api point in quacpac where we can optimize without restraints using openeye

pavankum avatar Jan 12 '22 17:01 pavankum

Chris Bayly talked about checking for electrostatic collapse and not using the restraints while optimizing with OE if there is no electrostatic collapse, that would result in running it twice though

My experience with quacpac is that it's a somewhat tightly-sealed black box, and I don't think it's a good use of our time to try and fiddle around with the internals. So on an effort-expenditure level I have a strong preference for "just run quacpac with its default ELF10, which will include restraints".

@pavankum I won't go point-by-point through your questions since I think they are all about the details of implementing "ELF1" in OpenEye, which I'm not advocating (instead I think we should use their pre-packaged ELF10).

j-wags avatar Jan 12 '22 18:01 j-wags

I think they are all about the details of implementing "ELF1" in OpenEye

Apologies for conflating different things, I was thinking more about ELF1 addressing the electrostatic collapse for certain molecules so that we don't need restraints at all for any backend toolkit. Your proposed pathway of checking connectivity and turning off optimization resolves everything, and I realized that would need work on OE's end since even if we pass single chosen conformer we don't have access to turn off their restraints.

pavankum avatar Jan 12 '22 20:01 pavankum

  • The output geometry from the AmberTools wrapper (antechamber/sqm) should be checked to see whether any bonds have broken/been formed

In one of my scripts, this is done with the rdkit pdb reader or qcelemental which guesses bonds based on covalent radii (the qcelemental solution comes from Simon B.). Both checks seem to agree really well with each other when a chemical graph changes.

  • If so, then sqm should be rerun on the input, but given the maxcyc=0 keyword, so that no minimization is performed.

For this, I first create a list of arguments that will later be read into antechamber. These arguments are then put into antechamber on the command line and run.. Note the use of maxcyc=0 as one of these arguments.

Also, if it ever becomes useful to call sqm, am1bcc, and atomtype separately (versus just calling everything quickly with antechamber), I have code that calls sqm, and then performs bcc and file operations later in file_op().

connordavel avatar Jan 19 '22 03:01 connordavel