openff-interchange
openff-interchange copied to clipboard
Design: Using pre-defined data on input molecules
Some quantities relevant to force field parametrization (partial charges, partial bond orders, aromaticity, etc.) can be calculated on the fly with fully-defined openff.toolkit.topology.molecule.Molecule
objects. Some/all of these can also be pre-defined beforehand, i.e. partial charges read from an SDF file. There's a paradigm in the toolkit of allowing these to be passed in to create_openmm_system
via kwargs, i.e.
omm_system = forcefield.create_openmm_system(
topology, charge_from_molecules=molecules,
partial_bond_orders_from_molecules=molecules,
toolkit_registry=toolkit_registry
)
There's some grey territory with respect to reproducibility when the definition of a force field conflicts with this usage, i.e. using partial charges from a file on disk vs. using AM1-BCC if prescribed by the force field.
The implementation of these methods has proven to be really tricky, as evidenced by several headaches and lines of code in the toolkit. Determining how charge methods override each other, determining how partial bond orders are calculated with one method in one handler should or should not affect interpolated parameters in another handler, all while considering whether or not to use the charges/bond orders/etc. specified in the topology.
In a perfect world, a simpler approach would be to completely de-couple these steps such that i.e. the partial bond orders are known before parameter assignment takes places and partial charges are resolved before the electrostatics handlers are created. A current implementation assumes that these data exist in the topology - not because it's best to read them from there, but because it's the simplest solution for now and allows this two-step process to be implemented in the future.
Related https://github.com/openforcefield/openff-toolkit/issues/748 https://github.com/openforcefield/openff-toolkit/issues/619 https://github.com/openforcefield/openff-toolkit/pull/705