QCElemental icon indicating copy to clipboard operation
QCElemental copied to clipboard

Mass validation & mtol

Open anabiman opened this issue 4 years ago • 4 comments

Describe the bug Cannot construct a valid molecule with specified masses.

To Reproduce

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.996])
oxy.mass_numbers -> array([-1], dtype=int16)

QCElemental couldn't guess the mass number to be 16 likely due to the relatively small mtol (1e-3) set in reconcile_nucleus() and from_schema(), so if we specify the mass number, we get a validation error:

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.996], mass_numbers=[16]) 
-> qcelemental.exceptions.ValidationError: Inconsistent or unspecified mass: A: 16, Z: None, E: O, mass: 15.996, real: None, label: None

# mass = 15.994 or 15.995 would work with mtol=1e-3.

Expected behavior To get the right mass number or avoid any errors, we can either turn off validation (which is not recommended), or add a keyword arg mtol to Molecule.__init__ and increase the value of this number, but neither approach is elegant or adequate for passing "oxy" to qcengine for e.g. (since mtol is not part of the Molecule schema or used by qcengine AFAIK).

What is the easiest way to solve this with minimal effort? Turn off mass validation by default? Make mtol adaptive?

Additional context Passing the keyword nonphysical to Molecule does not solve the problem. qcel v0.19.0.

anabiman avatar Apr 28 '21 18:04 anabiman

Or at least maybe print a warning msg, or raise an exception when mass_number = -1 i.e. qcel could not guess the "most probable" mass number?

anabiman avatar Apr 28 '21 19:04 anabiman

Don't turn off validate. That's giving a generic per atom

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[12.999], validate=False) #mtol=1.e-6)

print(oxy.mass_numbers)
# returns [16], wrong!

What is the use case here? Are you populating from a program with an old, fixed set of atomic masses? and then mass number is also needed? If what you want is modern masses while still preserving isotope info, perhaps initialize with mass_number=int(old_mass).

loriab avatar Apr 28 '21 19:04 loriab

The correct way to make this work is with something like the below. (That gets you through creation, but more tweaks needed for to_string, etc.) I'd still like to hear more about how this situation is coming about.

oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.999])
print(oxy.mass_numbers)  # [-1]
oxy = qcelemental.models.Molecule(geometry=[0,0,0], symbols=["O"], masses=[15.999], mtol=1.e-2)
print(oxy.mass_numbers)  # [16]
diff --git a/qcelemental/models/molecule.py b/qcelemental/models/molecule.py
index 72e3e53..9e160c2 100644
--- a/qcelemental/models/molecule.py
+++ b/qcelemental/models/molecule.py
@@ -333,8 +333,9 @@ class Molecule(ProtoModel):
             # original_keys = set(kwargs.keys())  # revive when ready to revisit sparsity
 
             nonphysical = kwargs.pop("nonphysical", False)
+            mtol = kwargs.pop("mtol", 1.e-3)
             schema = to_schema(
-                from_schema(kwargs, nonphysical=nonphysical), dtype=kwargs["schema_version"], copy=False, np_out=True
+                from_schema(kwargs, nonphysical=nonphysical, mtol=mtol), dtype=kwargs["schema_version"], copy=False, np_out=True
             )
             schema = _filter_defaults(schema)
 
diff --git a/qcelemental/molparse/from_schema.py b/qcelemental/molparse/from_schema.py
index d1775b4..aec087a 100644
--- a/qcelemental/molparse/from_schema.py
+++ b/qcelemental/molparse/from_schema.py
@@ -7,7 +7,7 @@ from ..util import provenance_stamp
 from .from_arrays import from_arrays
 
 
-def from_schema(molschema: Dict, *, nonphysical: bool = False, verbose: int = 1) -> Dict:
+def from_schema(molschema: Dict, *, nonphysical: bool = False, mtol: float = 1.e-3, verbose: int = 1) -> Dict:
     r"""Construct molecule dictionary representation from non-Psi4 schema.
 
     Parameters
@@ -85,7 +85,7 @@ def from_schema(molschema: Dict, *, nonphysical: bool = False, verbose: int = 1)
         # tooclose=tooclose,
         # zero_ghost_fragments=zero_ghost_fragments,
         nonphysical=nonphysical,
-        # mtol=mtol,
+        mtol=mtol,
         verbose=verbose,
     )

loriab avatar Apr 28 '21 19:04 loriab

What is the use case here? Are you populating from a program with an old, fixed set of atomic masses? and then mass number is also needed? If what you want is modern masses while still preserving isotope info, perhaps initialize with mass_number=int(old_mass).

Yes, that is exactly my case. I am using a molecule created by a bunch of different codes, convert it into qcelemental.Molecule, which is then passed to QCEngine. I don't need the mass numbers for now so I ended up removing them from my pipeline after I realized I was getting negative values.

Doing mtol = kwargs.pop("mtol", 1.e-3) in Molecule solves my problem within qcel but might require more changes for it to work with qcengine since mtol is lost (and I'm gonna guess qcengine re-instantiates a new Molecule object or maybe reruns the validation with the default mtol=1e-3). In any case, the simpler solution IMO would be to throw an exception here, so I am wondering why mass_number = -1 is allowed? If the caller is passing mass_numbers and an atomic mass outside the "mtol" range, why not consider this a failure? Or allow mass_number = -1 only when nonphysical=True?

anabiman avatar Apr 28 '21 21:04 anabiman