pymatgen icon indicating copy to clipboard operation
pymatgen copied to clipboard

Implicit hydrogen parsing should not be default for doped cifs.

Open TimoSommer opened this issue 3 years ago • 0 comments

Describe the bug I have found that when reading in a cif file of a doped structure, where hydrogen and another element share the same lattice site, the hydrogen is not shown in the structure, except as 'implicit_hydrogens' in structure.site_properties. If however the lattice site of the hydrogen is not shared with another element, the hydrogen is shown and treated like every other element. I think this is very confusing and should not be the default way of parsing (I think this was added in #692). I have not found any mention of how to turn off this behaviour in the documentation. Neither have I gotten a warning like I think I should get according to #1287.

To Reproduce

  1. Save the following cif as variable:
cif = """# generated using pymatgen
data_Ce2Fe2As2H1.8O0.2
_symmetry_space_group_name_H-M   P4/nmm
_cell_length_a   3.97371900
_cell_length_b   3.97371900
_cell_length_c   8.96776000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.00000000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   129
_chemical_formula_structural   Ce2Fe2As2H1.8O0.2
_chemical_formula_sum   'Ce2 Fe2 As2 H1.8 O0.2'
_cell_volume   141.60490035
_cell_formula_units_Z   1
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
  2  '-y+1/2, x+1/2, z'
  3  '-x, -y, z'
  4  'y+1/2, -x+1/2, z'
  5  'x+1/2, -y+1/2, -z'
  6  '-y, -x, -z'
  7  '-x+1/2, y+1/2, -z'
  8  'y, x, -z'
  9  '-x+1/2, -y+1/2, -z'
  10  'y, -x, -z'
  11  'x+1/2, y+1/2, -z'
  12  '-y, x, -z'
  13  '-x, y, z'
  14  'y+1/2, x+1/2, z'
  15  'x, -y, z'
  16  '-y+1/2, -x+1/2, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Ce  Ce0  2  0.00000000  0.50000000  0.86157000  1.0
  Fe  Fe1  2  0.00000000  0.00000000  0.50000000  1.0
  As  As2  2  0.00000000  0.50000000  0.31128600  1.0
  O  O3  2  0.00000000  0.00000000  0.00000000  0.1
  H  H4  2  0.00000000  0.00000000  0.00000000  0.9
"""
  1. Read in the cif file as structure:
from pymatgen.core.structure import Structure
structure = Structure.from_str(cif, fmt='cif')
  1. The structure shows (except for structure.site_properties) no sign of the hydrogen! structure.formula == 'Ce2 Fe2 As2 O0.2'

Expected behavior The doped hydrogen should be treated like every other element and shown in the formula and the structure. Implicit hydrogens make sense for molecules, but it doesn't make sense that the lattice site (H:0.99 O:0.01) is parsed completely different from the lattice site (H:1.0). On the other side, if this is truly the wanted default, I would wish for a simple option to turn off this behaviour and a clear warning when this happens.

Desktop (please complete the following information):

  • OS: Linux Mint 19.1 Cinnamon
  • pymatgen: pymatgen-2022.0.14

TimoSommer avatar Oct 21 '21 10:10 TimoSommer