BUG: Incorrect formula_units calculation in PhononBSDOSDoc class
Description
There is an issue with the calculation of formula_units in the PhononBSDOSDoc class within the file atomate2.common.schemas.phonons. The current implementation is as follows:
formula_units = (
structure.composition.num_atoms
/ structure.composition.reduced_composition.num_atoms
)
The problem arises because the reduced_composition in pymatgen does not always reduce formulas to the minimum number of elements. Instead, it reduces them to the "most stable molecule" as defined in pymatgen.core.composition -> Composition.special_formulas. This leads, for these elements, to an incorrect formula_units, which is half of the expected value. In case the structure is composed of only one atom, the formula unit is non-integer.
To Reproduce
Taking, for instance, this structure in xsf format:
CRYSTAL
PRIMVEC
0.00000000000000 1.56088551189537 1.56088551189537
1.56088551189537 0.00000000000000 1.56088551189537
1.56088551189537 1.56088551189537 0.00000000000000
PRIMCOORD
1 1
7 0.00000000000000 0.00000000000000 0.00000000000000
this code snippet
from pymatgen.core import Structure
from pymatgen.core.composition import Composition
structure = Structure.from_file('N-fcc.xsf')
print(f"Composition: {structure.composition} with {structure.composition.num_atoms} atoms\n\
Reduced composition: {structure.composition.reduced_composition} with {structure.composition.reduced_composition.num_atoms} atoms.\n\
formula_units: {structure.composition.num_atoms/structure.composition.reduced_composition.num_atoms}")
provides:
Composition: N1 with 1.0 atoms Reduced composition: N2 with 2.0 atoms. formula_units: 0.5
Expected behavior
The expected behavior would be to have a reduced composition N1 and so 1 atom. The formula_units would then be 1.
Proposed solution
Either to discuss the behavior of get_reduced_formula_and_factor at the pymatgen level (https://github.com/materialsproject/pymatgen/blob/4f536fc6c1798740a0f1402b0456675011706d75/src/pymatgen/core/composition.py#L428)
or change the definition of formula_units to
from pymatgen.core.ion import Ion
formula_units = (
structure.composition.num_atoms
/ Ion(structure.composition).reduced_composition.num_atoms
)
We are encountering the same problem with the outputdoc of the ABINIT phonon workflow, and we think we should find a common solution for all the atomate2 workflows.
Thanks for bringing this up. Regarding the reduced composition, I'll try to take care of that on the pymatgen side.
Normalization of extensive quantities with respect to formula units also breaks with our general convention of normalization wrt the number of atoms. The former should just be an integer multiple of the latter
Also FYI I'm currently working on revising the PhononBSDOSDoc schema so it's a good time to improve the document model if there are other issues you see with it!
@esoteric-ephemera @emarazzi if i can help with anything in this regard, let me know. I am overall happy with any improvements to fix these issues.
@JaGeo do you know why the normalization is done wrt formula units rather than number of atoms in PhononBSDOSDoc? I think I'd prefer normalization wrt number of atoms since it's less ambiguous and consistent with the other document models
@esoteric-ephemera I think i started from the pymatgen implementation for the atomate2 implementation where this is also done (https://github.com/materialsproject/pymatgen/blob/24cd6808a8ca7275b187a9dd164402864efb9190/src/pymatgen/phonon/dos.py#L245). In abpy the results can also be given per formula unit: https://abinit.github.io/abipy/gallery/plot_phthermo.html
@gpetretto any more insights here?
@esoteric-ephemera From my point of view, you can surely change this to make it more consistent but it also needs to be updated in the quasi-harmonic implementation
Apologies for the slow pace of updates: the new phonon schema is in place in emmet.core.phonons, and there should be a release candidate out soon. This schema should fix the issue with formula unit assignment but will need to be linked back to the workflows. That's next on my to do list