atomate2 icon indicating copy to clipboard operation
atomate2 copied to clipboard

BUG: Incorrect formula_units calculation in PhononBSDOSDoc class

Open emarazzi opened this issue 8 months ago • 6 comments

Description

There is an issue with the calculation of formula_units in the PhononBSDOSDoc class within the file atomate2.common.schemas.phonons. The current implementation is as follows:

formula_units = (
    structure.composition.num_atoms
    / structure.composition.reduced_composition.num_atoms
)

The problem arises because the reduced_composition in pymatgen does not always reduce formulas to the minimum number of elements. Instead, it reduces them to the "most stable molecule" as defined in pymatgen.core.composition -> Composition.special_formulas. This leads, for these elements, to an incorrect formula_units, which is half of the expected value. In case the structure is composed of only one atom, the formula unit is non-integer.

To Reproduce

Taking, for instance, this structure in xsf format:

CRYSTAL
PRIMVEC
 0.00000000000000 1.56088551189537 1.56088551189537
 1.56088551189537 0.00000000000000 1.56088551189537
 1.56088551189537 1.56088551189537 0.00000000000000
PRIMCOORD
 1 1
  7     0.00000000000000     0.00000000000000     0.00000000000000

this code snippet

from pymatgen.core import Structure
from pymatgen.core.composition import Composition

structure = Structure.from_file('N-fcc.xsf')
print(f"Composition: {structure.composition} with {structure.composition.num_atoms} atoms\n\
Reduced composition: {structure.composition.reduced_composition} with {structure.composition.reduced_composition.num_atoms} atoms.\n\
formula_units: {structure.composition.num_atoms/structure.composition.reduced_composition.num_atoms}")

provides:

Composition: N1 with 1.0 atoms Reduced composition: N2 with 2.0 atoms. formula_units: 0.5

Expected behavior

The expected behavior would be to have a reduced composition N1 and so 1 atom. The formula_units would then be 1.

Proposed solution

Either to discuss the behavior of get_reduced_formula_and_factor at the pymatgen level (https://github.com/materialsproject/pymatgen/blob/4f536fc6c1798740a0f1402b0456675011706d75/src/pymatgen/core/composition.py#L428)

or change the definition of formula_units to

from pymatgen.core.ion import Ion

formula_units = (
    structure.composition.num_atoms
    / Ion(structure.composition).reduced_composition.num_atoms
)

We are encountering the same problem with the outputdoc of the ABINIT phonon workflow, and we think we should find a common solution for all the atomate2 workflows.

emarazzi avatar May 19 '25 14:05 emarazzi

Thanks for bringing this up. Regarding the reduced composition, I'll try to take care of that on the pymatgen side.

Normalization of extensive quantities with respect to formula units also breaks with our general convention of normalization wrt the number of atoms. The former should just be an integer multiple of the latter

Also FYI I'm currently working on revising the PhononBSDOSDoc schema so it's a good time to improve the document model if there are other issues you see with it!

esoteric-ephemera avatar May 19 '25 17:05 esoteric-ephemera

@esoteric-ephemera @emarazzi if i can help with anything in this regard, let me know. I am overall happy with any improvements to fix these issues.

JaGeo avatar May 20 '25 06:05 JaGeo

@JaGeo do you know why the normalization is done wrt formula units rather than number of atoms in PhononBSDOSDoc? I think I'd prefer normalization wrt number of atoms since it's less ambiguous and consistent with the other document models

esoteric-ephemera avatar May 20 '25 15:05 esoteric-ephemera

@esoteric-ephemera I think i started from the pymatgen implementation for the atomate2 implementation where this is also done (https://github.com/materialsproject/pymatgen/blob/24cd6808a8ca7275b187a9dd164402864efb9190/src/pymatgen/phonon/dos.py#L245). In abpy the results can also be given per formula unit: https://abinit.github.io/abipy/gallery/plot_phthermo.html

@gpetretto any more insights here?

JaGeo avatar May 20 '25 19:05 JaGeo

@esoteric-ephemera From my point of view, you can surely change this to make it more consistent but it also needs to be updated in the quasi-harmonic implementation

JaGeo avatar May 20 '25 19:05 JaGeo

Apologies for the slow pace of updates: the new phonon schema is in place in emmet.core.phonons, and there should be a release candidate out soon. This schema should fix the issue with formula unit assignment but will need to be linked back to the workflows. That's next on my to do list

esoteric-ephemera avatar Jun 05 '25 16:06 esoteric-ephemera