openff-evaluator
openff-evaluator copied to clipboard
When a molecule is assigned ASH as a random name, it's incorrectly treated as ASP.
I think I ran into a hopefully one-off error generating a PDB using the Equilibration workflow where not all bonds were present, so just opening an issue to collect any potential future reports.
To reproduce using the same code-paths that trigger an error during PreequilibratedSimulation workflows:
from openff.toolkit import Molecule, ForceField, Topology
import openmm
from openmm import app
unique_molecules = [
Molecule.from_smiles("C1CCOCC1"),
Molecule.from_smiles("Cc1ccccc1C"),
]
pdb_file = app.PDBFile("output.pdb")
topology = Topology.from_openmm(pdb_file.topology, unique_molecules=unique_molecules)
Error triggered:
File ~/micromamba/envs/openff-toolkit-release/lib/python3.12/site-packages/openff/toolkit/topology/topology.py:1517, in Topology.from_openmm(cls, openmm_topology, unique_molecules, positions)
1506 if hill_formula in probably_missing_conect:
1507 msg += (
1508 "This would be a very unusual molecule to try and parameterize, "
1509 "and it is likely that the data source it was read from does not "
(...)
1515 'not specified in the standard residue connectivity table."'
1516 )
-> 1517 raise ValueError(msg)
1519 # The connected_component_subgraph function above may have scrambled the molecule order, so sort molecules
1520 # by their first atom's topology index
1521 topology_molecules_to_add.sort(key=lambda x: x[0])
ValueError: No match found for molecule C. This would be a very unusual molecule to try and parameterize, and it is likely that the data source it was read from does not contain connectivity information. If this molecule is coming from PDB, please ensure that the file contains CONECT records. The PDB format documentation (https://www.wwpdb.org/documentation/file-format-content/format33/sect10.html) states "CONECT records are mandatory for HET groups (excluding water) and for other bonds not specified in the standard residue connectivity table."
Looking at the bonds gives a clue -- ASP has none --
import collections
counts = collections.defaultdict(set)
res = list(pdb_file.topology.residues())
for r in res:
atoms = list(r.atoms())
bonds = list(r.bonds())
counts[r.name].add((len(atoms), len(bonds)))
counts
defaultdict(set, {'VLQ': {(16, 16)}, 'ASP': {(18, 0)}})
This was odd, sent me in a couple wrong directions for a few days. I think I've re-generated a fully-bonded PDB file now without changing anything in my environment, and am not super interested in trying to dig into what may have been a one-off error, but just noting it as a possible source of future issues.