pymatgen
pymatgen copied to clipboard
Implicit hydrogen parsing should not be default for doped cifs.
Describe the bug I have found that when reading in a cif file of a doped structure, where hydrogen and another element share the same lattice site, the hydrogen is not shown in the structure, except as 'implicit_hydrogens' in structure.site_properties. If however the lattice site of the hydrogen is not shared with another element, the hydrogen is shown and treated like every other element. I think this is very confusing and should not be the default way of parsing (I think this was added in #692). I have not found any mention of how to turn off this behaviour in the documentation. Neither have I gotten a warning like I think I should get according to #1287.
To Reproduce
- Save the following cif as variable:
cif = """# generated using pymatgen
data_Ce2Fe2As2H1.8O0.2
_symmetry_space_group_name_H-M P4/nmm
_cell_length_a 3.97371900
_cell_length_b 3.97371900
_cell_length_c 8.96776000
_cell_angle_alpha 90.00000000
_cell_angle_beta 90.00000000
_cell_angle_gamma 90.00000000
_symmetry_Int_Tables_number 129
_chemical_formula_structural Ce2Fe2As2H1.8O0.2
_chemical_formula_sum 'Ce2 Fe2 As2 H1.8 O0.2'
_cell_volume 141.60490035
_cell_formula_units_Z 1
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 'x, y, z'
2 '-y+1/2, x+1/2, z'
3 '-x, -y, z'
4 'y+1/2, -x+1/2, z'
5 'x+1/2, -y+1/2, -z'
6 '-y, -x, -z'
7 '-x+1/2, y+1/2, -z'
8 'y, x, -z'
9 '-x+1/2, -y+1/2, -z'
10 'y, -x, -z'
11 'x+1/2, y+1/2, -z'
12 '-y, x, -z'
13 '-x, y, z'
14 'y+1/2, x+1/2, z'
15 'x, -y, z'
16 '-y+1/2, -x+1/2, z'
loop_
_atom_site_type_symbol
_atom_site_label
_atom_site_symmetry_multiplicity
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
Ce Ce0 2 0.00000000 0.50000000 0.86157000 1.0
Fe Fe1 2 0.00000000 0.00000000 0.50000000 1.0
As As2 2 0.00000000 0.50000000 0.31128600 1.0
O O3 2 0.00000000 0.00000000 0.00000000 0.1
H H4 2 0.00000000 0.00000000 0.00000000 0.9
"""
- Read in the cif file as structure:
from pymatgen.core.structure import Structure
structure = Structure.from_str(cif, fmt='cif')
- The structure shows (except for structure.site_properties) no sign of the hydrogen!
structure.formula == 'Ce2 Fe2 As2 O0.2'
Expected behavior The doped hydrogen should be treated like every other element and shown in the formula and the structure. Implicit hydrogens make sense for molecules, but it doesn't make sense that the lattice site (H:0.99 O:0.01) is parsed completely different from the lattice site (H:1.0). On the other side, if this is truly the wanted default, I would wish for a simple option to turn off this behaviour and a clear warning when this happens.
Desktop (please complete the following information):
- OS: Linux Mint 19.1 Cinnamon
- pymatgen: pymatgen-2022.0.14