openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

Should the Molecule class support radicals?

Open jchodera opened this issue 4 years ago • 8 comments

Describe the bug Molecule.to_rdkit() can produce a Molecule with a different number of atoms than the source RDKit molecule.

This appears to be because the hydrogens_are_explicit flag is set to False by default in this line, which means the molecule can be modified when converted from RDKit to Molecule. This is incredibly dangerous, since the point of being able to move between these representations is to guarantee we are always working with a one-to-one correspondence between atoms in all representations. This makes it easy to completely break this assumption. Worse, there is no way to override this, making the toolkit essentially useless for tasks that involve preserving atom indices and ordering---e.g. force field parameterization.

To Reproduce We're working on a simple reproduction case here with @peastman

Output This is a related error with generate_conformers() for this Molecule.

Traceback (most recent call last):
  File "createDipeptides.py", line 175, in <module>
    createConformations(outputfile, forcefield, topology, positions, f'{name1}-{name2}', charges)
  File "createDipeptides.py", line 86, in createConformations
    mol.generate_conformers(n_conformers=10, rms_cutoff=0*unit.nanometers)
  File "/home/peastman/workspace/openff-toolkit/openff/toolkit/topology/molecule.py", line 3221, in generate_conformers
    return toolkit_registry.call(
  File "/home/peastman/workspace/openff-toolkit/openff/toolkit/utils/toolkit_registry.py", line 366, in call
    raise e
  File "/home/peastman/workspace/openff-toolkit/openff/toolkit/utils/toolkit_registry.py", line 362, in call
    return method(*args, **kwargs)
  File "/home/peastman/workspace/openff-toolkit/openff/toolkit/utils/rdkit_wrapper.py", line 899, in generate_conformers
    molecule._add_conformer(conformer)
  File "/home/peastman/workspace/openff-toolkit/openff/toolkit/topology/molecule.py", line 3997, in _add_conformer
    raise Exception(
Exception: molecule.add_conformer given input of the wrong shape: Given (42, 3), expected (40, 3)

Computing environment (please complete the following information): (will have to fill in later)

Additional context cc https://github.com/openmm/qmdataset/pull/5#issuecomment-917666335

jchodera avatar Sep 13 '21 00:09 jchodera

I was unable to reproduce this on my computer from the linked discussion. Is there a file or script I can run? Have we confirmed that this is a toolkit issue and not a cheminformatics/molecule representation issue?

j-wags avatar Sep 23 '21 00:09 j-wags

It happens in molecules that have radicals. Here's an example.

>>> from openff.toolkit.topology import Molecule
>>> from rdkit import Chem
>>> rdmol = Chem.MolFromSmiles('[CH3]')
>>> ffmol = Molecule.from_rdkit(rdmol)
>>> ffmol.to_smiles()
'[H][CH]([H])[H]'

The OpenFF molecule has four hydrogens, while the RDKit molecule has three.

peastman avatar Sep 27 '21 20:09 peastman

Thanks @peastman for the simple reproducing example - would it be possible to make one that does not involve radicals? Since we don't encounter radicals much if at all in fitting, cases like these haven't really been weighted in our tests or decisions.

I'll admit that the exact contract with the user around these wrappers isn't made super clear, and it's something we could improve. Complete interoperability across toolkits and file formats is not really tractable, but we should make more clear the expectations about what is and is not safely interoperable.

mattwthompson avatar Sep 27 '21 20:09 mattwthompson

would it be possible to make one that does not involve radicals?

I don't think so. If all electrons are paired, it isn't going to add any hydrogens.

Complete interoperability across toolkits and file formats is not really tractable, but we should make more clear the expectations about what is and is not safely interoperable.

This shouldn't be a difficult case to support. Radicals may not come up a lot in fitting, but they aren't especially exotic. And RDKit labels them explicitly. If OpenFF's object model can't support them, it would be best to throw an exception.

peastman avatar Sep 27 '21 21:09 peastman

If the problematic behavior of concern here is limited to radicals (and reverting #566 / implementing #144 is not tractable) then I'm inclined to agree with your suggestion.

I'll leave this open for a few days for feedback, and then submit a patch.

mattwthompson avatar Sep 27 '21 21:09 mattwthompson

Throwing an exception could be a good solution for molecules with radicals. I'm trying to think through whether there are any stable radicals that we should handle now or in the future. Nitro and cyano groups may be in this pool, or transition metals if we ever support them. Does anyone know about specific radical inputs that we need to watch out for?

There's also some question of WHEN we should validate the molecule and check for radicals. This issue originally arose when creating a molecule using the add_atom and add_bond API, and I've learned that we don't want to do continuous validity checking after those calls. So maybe we could add radical/implicit H-catching logic in a offmol.validate method that gets run in:

  • from_openeye
  • from_rdkit
  • to_openeye
  • to_rdkit

One thing I want to be careful about is that we don't make an offmol.sanitize method. RDKit and OpenEye sanitization processes are extremely complex and we will drown if we try to standardize/replicate them.

j-wags avatar Sep 29 '21 16:09 j-wags

I'm a fan of throwing an exception because

  • it explicitly limits the scope of the toolkit, bringing it more in like with behavior that it designed for and tested on
  • it implicitly denotes our contract with the user
  • forces us to be careful and intentional if/when we implement some or all of the above cases

I share your hesitation re: Molecule.validate and the associated scope creep/complexity explosion

mattwthompson avatar Sep 29 '21 17:09 mattwthompson

Also a fan of raising an exception for radicals; let me know if you need anything from me to move this forward.

davidlmobley avatar Oct 19 '21 17:10 davidlmobley

Just to leave some more breadcrumbs: #1236 (and associated release 0.11.1) adds an exception when something looks like a radical. This is because the toolkit has no understanding of radicals, no tests around such cases, no associated documentation or examples, etc. Radicals that did get processed were done somewhat ambiguously and, naturally, this led to some unhandled corner cases.

This does not close the door on supporting them in the future, but we need to treat that similarly to a feature addition - clearly-defined behavior, extensive tests, added documentation, etc. in order for maintain a healthy "contract" with users. That discussion can get started in another issue or by re-opening this one.

mattwthompson avatar Sep 23 '22 16:09 mattwthompson