pymatgen
pymatgen copied to clipboard
CifWriter writes standard-incompliant chemical formulae and cell formula
Describe the bug A clear and concise description of what the bug is.
Chemical formulae-related items written by CifWriter do not comply with the specification.
To Reproduce Steps to reproduce the behavior:
Run the following script (POSCAR of CH₃NH₃PbI₃ from https://materialsproject.org/materials/mp-1194604/):
#!/usr/bin/env python3
import pymatgen.core
import pymatgen.io.cif
import pymatgen.io.vasp
def main():
print("#", pymatgen.core.__version__)
s = pymatgen.io.vasp.Poscar.from_string(
"""H24 Pb4 C4 I12 N4
1.0
8.6500830000000004 0.0000000000000000 0.0000000000000005
0.0000000000000014 8.9913910000000001 0.0000000000000006
0.0000000000000000 0.0000000000000000 13.1226540000000007
H Pb C I N
24 4 4 12 4
direct
0.3037930000000000 0.4412980000000000 0.2500000000000000 H+
0.1962070000000000 0.9412980000000000 0.2500000000000000 H+
0.6962070000000000 0.5587020000000000 0.7500000000000000 H+
0.8037930000000000 0.0587020000000000 0.7500000000000000 H+
0.4531830000000000 0.3445690000000000 0.1812530000000001 H+
0.0468170000000000 0.8445690000000000 0.3187469999999999 H+
0.5468170000000000 0.6554310000000000 0.6812530000000001 H+
0.9531830000000000 0.1554310000000000 0.8187469999999999 H+
0.5468170000000000 0.6554310000000000 0.8187469999999999 H+
0.9531830000000000 0.1554310000000000 0.6812530000000001 H+
0.4531830000000000 0.3445690000000000 0.3187469999999999 H+
0.0468170000000000 0.8445690000000000 0.1812530000000001 H+
0.4961490000000000 0.6095140000000000 0.1858000000000000 H+
0.0038510000000000 0.1095139999999999 0.3141999999999999 H+
0.5038510000000000 0.3904860000000000 0.6858000000000001 H+
0.9961490000000000 0.8904860000000000 0.8141999999999999 H+
0.5038510000000000 0.3904860000000000 0.8141999999999999 H+
0.9961490000000000 0.8904860000000000 0.6858000000000001 H+
0.4961490000000000 0.6095140000000000 0.3141999999999999 H+
0.0038510000000000 0.1095139999999999 0.1858000000000000 H+
0.3604920000000000 0.4782410000000000 0.7500000000000000 H+
0.1395080000000000 0.9782410000000000 0.7500000000000000 H+
0.6395080000000000 0.5217590000000000 0.2500000000000000 H+
0.8604920000000000 0.0217590000000000 0.2500000000000000 H+
0.5000000000000000 0.0000000000000000 0.0000000000000000 Pb2+
0.0000000000000000 0.5000000000000000 0.5000000000000000 Pb2+
0.5000000000000000 0.0000000000000000 0.5000000000000000 Pb2+
0.0000000000000000 0.5000000000000000 0.0000000000000000 Pb2+
0.4259970000000000 0.4084900000000000 0.2500000000000000 C2-
0.0740030000000000 0.9084900000000000 0.2500000000000000 C2-
0.5740030000000000 0.5915100000000000 0.7500000000000000 C2-
0.9259970000000000 0.0915100000000000 0.7500000000000000 C2-
0.5692760000000000 0.9730720000000000 0.2500000000000000 I-
0.9307240000000000 0.4730720000000002 0.2500000000000000 I-
0.4307240000000000 0.0269280000000000 0.7500000000000000 I-
0.0692759999999999 0.5269280000000000 0.7500000000000000 I-
0.3263700000000000 0.6794470000000000 0.0175740000000000 I-
0.1736300000000000 0.1794470000000001 0.4824260000000000 I-
0.6736300000000000 0.3205530000000000 0.5175740000000000 I-
0.8263700000000000 0.8205530000000000 0.9824260000000000 I-
0.6736300000000000 0.3205530000000000 0.9824260000000000 I-
0.8263700000000000 0.8205530000000000 0.5175740000000000 I-
0.3263700000000000 0.6794470000000000 0.4824260000000000 I-
0.1736300000000000 0.1794470000000001 0.0175740000000000 I-
0.4788280000000000 0.4539980000000000 0.7500000000000000 N3-
0.0211720000000000 0.9539980000000000 0.7500000000000000 N3-
0.5211720000000000 0.5460020000000000 0.2500000000000000 N3-
0.9788280000000000 0.0460020000000000 0.2500000000000000 N3-
"""
)
w = pymatgen.io.cif.CifWriter(s.structure)
print(w)
if __name__ == "__main__":
main()
Output is as follows:
# 2022.11.7
# generated using pymatgen
data_H6PbCI3N
_symmetry_space_group_name_H-M 'P 1'
_cell_length_a 8.65008300
_cell_length_b 8.99139100
_cell_length_c 13.12265400
_cell_angle_alpha 90.00000000
_cell_angle_beta 90.00000000
_cell_angle_gamma 90.00000000
_symmetry_Int_Tables_number 1
_chemical_formula_structural H6PbCI3N
_chemical_formula_sum 'H24 Pb4 C4 I12 N4'
_cell_volume 1020.63119132
_cell_formula_units_Z 4
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 'x, y, z'
loop_
_atom_site_type_symbol
_atom_site_label
_atom_site_symmetry_multiplicity
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
H H0 1 0.30379300 0.44129800 0.25000000 1
H H1 1 0.19620700 0.94129800 0.25000000 1
H H2 1 0.69620700 0.55870200 0.75000000 1
H H3 1 0.80379300 0.05870200 0.75000000 1
H H4 1 0.45318300 0.34456900 0.18125300 1
H H5 1 0.04681700 0.84456900 0.31874700 1
H H6 1 0.54681700 0.65543100 0.68125300 1
H H7 1 0.95318300 0.15543100 0.81874700 1
H H8 1 0.54681700 0.65543100 0.81874700 1
H H9 1 0.95318300 0.15543100 0.68125300 1
H H10 1 0.45318300 0.34456900 0.31874700 1
H H11 1 0.04681700 0.84456900 0.18125300 1
H H12 1 0.49614900 0.60951400 0.18580000 1
H H13 1 0.00385100 0.10951400 0.31420000 1
H H14 1 0.50385100 0.39048600 0.68580000 1
H H15 1 0.99614900 0.89048600 0.81420000 1
H H16 1 0.50385100 0.39048600 0.81420000 1
H H17 1 0.99614900 0.89048600 0.68580000 1
H H18 1 0.49614900 0.60951400 0.31420000 1
H H19 1 0.00385100 0.10951400 0.18580000 1
H H20 1 0.36049200 0.47824100 0.75000000 1
H H21 1 0.13950800 0.97824100 0.75000000 1
H H22 1 0.63950800 0.52175900 0.25000000 1
H H23 1 0.86049200 0.02175900 0.25000000 1
Pb Pb24 1 0.50000000 0.00000000 0.00000000 1
Pb Pb25 1 0.00000000 0.50000000 0.50000000 1
Pb Pb26 1 0.50000000 0.00000000 0.50000000 1
Pb Pb27 1 0.00000000 0.50000000 0.00000000 1
C C28 1 0.42599700 0.40849000 0.25000000 1
C C29 1 0.07400300 0.90849000 0.25000000 1
C C30 1 0.57400300 0.59151000 0.75000000 1
C C31 1 0.92599700 0.09151000 0.75000000 1
I I32 1 0.56927600 0.97307200 0.25000000 1
I I33 1 0.93072400 0.47307200 0.25000000 1
I I34 1 0.43072400 0.02692800 0.75000000 1
I I35 1 0.06927600 0.52692800 0.75000000 1
I I36 1 0.32637000 0.67944700 0.01757400 1
I I37 1 0.17363000 0.17944700 0.48242600 1
I I38 1 0.67363000 0.32055300 0.51757400 1
I I39 1 0.82637000 0.82055300 0.98242600 1
I I40 1 0.67363000 0.32055300 0.98242600 1
I I41 1 0.82637000 0.82055300 0.51757400 1
I I42 1 0.32637000 0.67944700 0.48242600 1
I I43 1 0.17363000 0.17944700 0.01757400 1
N N44 1 0.47882800 0.45399800 0.75000000 1
N N45 1 0.02117200 0.95399800 0.75000000 1
N N46 1 0.52117200 0.54600200 0.25000000 1
N N47 1 0.97882800 0.04600200 0.25000000 1
Focus on the lines
_chemical_formula_structural H6PbCI3N
_chemical_formula_sum 'H24 Pb4 C4 I12 N4'
_cell_formula_units_Z 4
Expected behavior A clear and concise description of what you expected to happen.
The lines must be
_chemical_formula_structural '(C H3 N H3)4 Pb4 I12'
_chemical_formula_sum 'C4 H24 I12 N4 Pb4'
_cell_formula_units_Z 1
or
_chemical_formula_structural '(C H3 N H3)1 Pb I3'
_chemical_formula_sum 'C1 H6 I3 N1 Pb1'
_cell_formula_units_Z 4
- [ ] The elemental order of
_chemical_formula_sumis wrong - [ ] IIUC the numbers of elements in
_chemical_formula_structuraland_chemical_formula_summust be the same; the above_chemical_formula_structural’s are for illustration, and it would be difficult to write them as such, though
From _cell_formula_units_Z:
The number of the formula units in the unit cell as specified by _chemical_formula_structural, _chemical_formula_moiety or _chemical_formula_sum.
From _chemical_formula_structural:
See the _chemical_formula_[] category description for the rules for writing chemical formulae for inorganics, organometallics, metal complexes etc., in which bonded groups are preserved as discrete entities within parentheses, with post-multipliers as required. The order of the elements should give as much information as possible about the chemical structure. Parentheses may be used and nested as required. This formula should correspond to the structure as actually reported, i.e. trace elements not included in atom-type and atom-site lists should not be included in this formula (see also _chemical_formula_analytical).
From _chemical_formula_sum:
See the _chemical_formula_[] category description for the rules for writing chemical formulae in which all discrete bonded residues and ions are summed over the constituent elements, following the ordering given in general rule (5) in the _chemical_formula_[] category description. Parentheses are not normally used.
From https://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Cchemical_formula.html
(5) Unless the elements are ordered in a manner that corresponds to their chemical structure, as in _chemical_formula_structural, the order of the elements within any group or moiety depends on whether carbon is present or not. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetical order of their symbol. This is the 'Hill' system used by Chemical Abstracts. This ordering is used in _chemical_formula_moiety and _chemical_formula_sum.
(emphasis mine)
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please supply relevant versions and platform info):
- OS: (e.g. Mac, Windows, Linux): EndeavourOS
- Version (e.g. 2022.11.17): 2022.11.7
Additional context Add any other context about the problem here.
Thanks for reporting this. But I would like to understand what is the actual implication of this beyond "non-standards" compliance. Does it affect the use of the CIF in any software out there?
I am happy for someone to write a PR to fix this. But unless there is a pressing compatibility problem, I don't foresee being able to spend time to work on this.
Thank you for your reply.
So far I have no software problems related to elemental order of _chemical_formula_sum and to discrepancy among _cell_formula_units_Z and _chemical_formula_sum/_chemical_formula_structural, but the latter is, I believe, incorrect.