Describe the bug I have found that when reading in a cif file of a doped structure, where hydrogen and another element share the same lattice site, the hydrogen is not shown in the structure, except as 'implicit_hydrogens' in structure.site_properties. If however the lattice site of the hydrogen is not shared with another element, the hydrogen is shown and treated like every other element. I think this is very confusing and should not be the default way of parsing (I think this was added in #692). I have not found any mention of how to turn off this behaviour in the documentation. Neither have I gotten a warning like I think I should get according to #1287.

To Reproduce

  1. Save the following cif as variable:
cif = """# generated using pymatgen
_symmetry_space_group_name_H-M   P4/nmm
_cell_length_a   3.97371900
_cell_length_b   3.97371900
_cell_length_c   8.96776000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.00000000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   129
_chemical_formula_structural   Ce2Fe2As2H1.8O0.2
_chemical_formula_sum   'Ce2 Fe2 As2 H1.8 O0.2'
_cell_volume   141.60490035
_cell_formula_units_Z   1
  1  'x, y, z'
  2  '-y+1/2, x+1/2, z'
  3  '-x, -y, z'
  4  'y+1/2, -x+1/2, z'
  5  'x+1/2, -y+1/2, -z'
  6  '-y, -x, -z'
  7  '-x+1/2, y+1/2, -z'
  8  'y, x, -z'
  9  '-x+1/2, -y+1/2, -z'
  10  'y, -x, -z'
  11  'x+1/2, y+1/2, -z'
  12  '-y, x, -z'
  13  '-x, y, z'
  14  'y+1/2, x+1/2, z'
  15  'x, -y, z'
  16  '-y+1/2, -x+1/2, z'
  Ce  Ce0  2  0.00000000  0.50000000  0.86157000  1.0
  Fe  Fe1  2  0.00000000  0.00000000  0.50000000  1.0
  As  As2  2  0.00000000  0.50000000  0.31128600  1.0
  O  O3  2  0.00000000  0.00000000  0.00000000  0.1
  H  H4  2  0.00000000  0.00000000  0.00000000  0.9
  1. Read in the cif file as structure:
from pymatgen.core.structure import Structure
structure = Structure.from_str(cif, fmt='cif')
  1. The structure shows (except for structure.site_properties) no sign of the hydrogen! structure.formula == 'Ce2 Fe2 As2 O0.2'

Expected behavior The doped hydrogen should be treated like every other element and shown in the formula and the structure. Implicit hydrogens make sense for molecules, but it doesn't make sense that the lattice site (H:0.99 O:0.01) is parsed completely different from the lattice site (H:1.0). On the other side, if this is truly the wanted default, I would wish for a simple option to turn off this behaviour and a clear warning when this happens.

Desktop (please complete the following information):

  • OS: Linux Mint 19.1 Cinnamon
  • pymatgen: pymatgen-2022.0.14

