wrong number of atoms generated from cif file.
Python version
python 3.12.4
Pymatgen version
2024.7.18
Operating system version
Linux Mint 21.3
Current behavior
the wrong number of Oxygen and Silicon atoms is generated from the Cif file.
Expected Behavior
The correct number of Oxygen and Silicon atoms. 72 Oxygen atoms and 36 silicon atoms The file is read correctly by Ovito.
Minimal example
from pymatgen.core import Structure
structure = Structure.from_file("ATO.cif")
print(structure.formula)
Relevant files to reproduce this bug
https://europe.iza-structure.org/IZA-SC/download_cif.php?ID=42
Just to aid in diagnosing this, the relevant output is the following
/home/wladerer/.venv/base/lib/python3.10/site-packages/pymatgen/io/cif.py:1285: UserWarning: Issues encountered while parsing CIF: 3 fractional coordinates rounded to ideal values to avoid issues with finite precision.
Skipping relative stoichiometry check because CIF does not contain formula keys.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Full Formula (Si36 O76)
Reduced Formula: Si9O19
And the file contents don't pass a first check of the following section of the cif file
loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
O1 O 0.6667 0.1347 0.8333
O2 O 0.5760 0.0000 0.0000
O3 O 0.5544 0.1089 0.1382
O4 O 0.6667 0.1006 0.3333
T1 Si 0.6160 0.0861 0.0763
Said check:
https://github.com/materialsproject/pymatgen/blob/633e7bd7700fc96b407d882cce2b473a38a97c67/src/pymatgen/io/cif.py#L1423-L1428
Update:
It seems to be a precision issue. If you use the space group analyzer with the default precision, you will get the following error
Traceback (most recent call last):
File "/home/wladerer/Downloads/atotest/pmgtest.py", line 11, in <module>
orginal_sg = SpacegroupAnalyzer(original).get_space_group_number()
File "/home/wladerer/.venv/base/lib/python3.10/site-packages/pymatgen/symmetry/analyzer.py", line 142, in __init__
self._space_group_data = _get_symmetry_dataset(self._cell, symprec, angle_tolerance)
File "/home/wladerer/.venv/base/lib/python3.10/site-packages/pymatgen/symmetry/analyzer.py", line 70, in _get_symmetry_dataset
raise SymmetryUndetermined
pymatgen.symmetry.analyzer.SymmetryUndetermined
So if you go from
SpacegroupAnalyzer(original).get_space_group_number()
to
SpacegroupAnalyzer(original, 0.0001).get_space_group_number()
You get space group 1. If you use ASE to find the space group, it gets space group 146 (R 3). Neither of these are correct according to the header of the file, saying that it is R -3 m / 166. It is still interesting that ASE gets the correct stoichiometry but incorrect space group.
Hi, it seems to be a precision issue of site O3. If you parse the structure with default settings and have a look at the distance matrix, you can see that there are 4 pairs of very close oxygen sites:
dist_matrix = cif_struct.distance_matrix
dists = list(dist_matrix.flatten())
sorted_dists = sorted(dists)
c = Counter(dists)
print({k: c[k] for k in sorted_dists if k < 1.5})
close_ids = set()
for row_id, row in enumerate(dist_matrix):
for col_id, cell in enumerate(row):
if 0.0 < cell < 0.003:
print(row_id, col_id, cell, cif_struct.sites[row_id].frac_coords, cif_struct.sites[col_id].frac_coords)
gives the following output:
{0.0: 112, 0.002091399999999175: 2, 0.00209140000000036: 2, 0.002091400000002075: 2, 0.002091400000002283: 2}
77 81 0.002091400000002283 [0.22443333 0.11216667 0.80486667] [0.22443333 0.11226667 0.80486667]
80 82 0.002091399999999175 [0.88783333 0.11226667 0.80486667] [0.88773333 0.11216667 0.80486667]
81 77 0.002091400000002283 [0.22443333 0.11226667 0.80486667] [0.22443333 0.11216667 0.80486667]
82 80 0.002091399999999175 [0.88773333 0.11216667 0.80486667] [0.88783333 0.11226667 0.80486667]
84 92 0.002091400000002075 [0.11226667 0.22443333 0.19513333] [0.11216667 0.22443333 0.19513333]
90 93 0.00209140000000036 [0.11216667 0.88773333 0.19513333] [0.11226667 0.88783333 0.19513333]
92 84 0.002091400000002075 [0.11216667 0.22443333 0.19513333] [0.11226667 0.22443333 0.19513333]
93 90 0.00209140000000036 [0.11226667 0.88783333 0.19513333] [0.11216667 0.88773333 0.19513333]
This can be circumvented by changing the site_tolerance parameter of CifParser to count two close sites as one. E.g., the following structure
cif_struct = CifParser("ATO.cif", site_tolerance=1e-3).parse_structures()[0]
will have the correct formula Si36 O72.
The equivalent functionality in ase seems to have a higher tolerance parameter (symprec) to count two sites as one (https://gitlab.com/ase/ase/-/blob/master/ase/spacegroup/spacegroup.py?ref_type=heads#L378), this is why the correct stochiometry is found with the default settings @wladerer .
Thank you @kaueltzen !
Is this answer sufficient for you, @JPBergsma ? I would close the issue then.
Thank you for looking into this issue more deeply on such short notice.
The main thing I am still wondering about is whether such a structure as I encountered here could be handled more elegantly in the future.
Assuming that the site occupancy is 1, there is a minimum distance two sites could have in any realistic structure. Let's assume that this distance is 0.5 angstrom. Then any inter site distance of less than 0.25 Angstrom probably means that two sites should overlap and only one atom should be added to the structure. I guess the best atomic position would be the average of the two positions in that case.
Do you think that this is reasonable and that this could be added to how structures are handled?
@JPBergsma I am not sure. I personally wouldn't want any code to handle this automatically for me as I would fear unexpected side effects.
Are there Maintainer opinions? @shyuep @mkhorton @janosh ?
See https://github.com/spglib/spglib/discussions/514 for the related discussion.