pymatgen icon indicating copy to clipboard operation
pymatgen copied to clipboard

wrong number of atoms generated from cif file.

Open JPBergsma opened this issue 1 year ago • 6 comments

Python version

python 3.12.4

Pymatgen version

2024.7.18

Operating system version

Linux Mint 21.3

Current behavior

the wrong number of Oxygen and Silicon atoms is generated from the Cif file.

Expected Behavior

The correct number of Oxygen and Silicon atoms. 72 Oxygen atoms and 36 silicon atoms The file is read correctly by Ovito.

Minimal example

from pymatgen.core import Structure

structure = Structure.from_file("ATO.cif")
print(structure.formula)

Relevant files to reproduce this bug

https://europe.iza-structure.org/IZA-SC/download_cif.php?ID=42

JPBergsma avatar Aug 01 '24 17:08 JPBergsma

Just to aid in diagnosing this, the relevant output is the following

/home/wladerer/.venv/base/lib/python3.10/site-packages/pymatgen/io/cif.py:1285: UserWarning: Issues encountered while parsing CIF: 3 fractional coordinates rounded to ideal values to avoid issues with finite precision.
Skipping relative stoichiometry check because CIF does not contain formula keys.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Full Formula (Si36 O76)
Reduced Formula: Si9O19

And the file contents don't pass a first check of the following section of the cif file

loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
    O1    O     0.6667    0.1347    0.8333
    O2    O     0.5760    0.0000    0.0000
    O3    O     0.5544    0.1089    0.1382
    O4    O     0.6667    0.1006    0.3333
    T1    Si    0.6160    0.0861    0.0763

Said check:

https://github.com/materialsproject/pymatgen/blob/633e7bd7700fc96b407d882cce2b473a38a97c67/src/pymatgen/io/cif.py#L1423-L1428

Update:

It seems to be a precision issue. If you use the space group analyzer with the default precision, you will get the following error

Traceback (most recent call last):
  File "/home/wladerer/Downloads/atotest/pmgtest.py", line 11, in <module>
    orginal_sg = SpacegroupAnalyzer(original).get_space_group_number()
  File "/home/wladerer/.venv/base/lib/python3.10/site-packages/pymatgen/symmetry/analyzer.py", line 142, in __init__
    self._space_group_data = _get_symmetry_dataset(self._cell, symprec, angle_tolerance)
  File "/home/wladerer/.venv/base/lib/python3.10/site-packages/pymatgen/symmetry/analyzer.py", line 70, in _get_symmetry_dataset
    raise SymmetryUndetermined
pymatgen.symmetry.analyzer.SymmetryUndetermined

So if you go from

SpacegroupAnalyzer(original).get_space_group_number()

to

SpacegroupAnalyzer(original, 0.0001).get_space_group_number()

You get space group 1. If you use ASE to find the space group, it gets space group 146 (R 3). Neither of these are correct according to the header of the file, saying that it is R -3 m / 166. It is still interesting that ASE gets the correct stoichiometry but incorrect space group.

wladerer avatar Aug 02 '24 02:08 wladerer

Hi, it seems to be a precision issue of site O3. If you parse the structure with default settings and have a look at the distance matrix, you can see that there are 4 pairs of very close oxygen sites:

dist_matrix = cif_struct.distance_matrix
dists = list(dist_matrix.flatten())
sorted_dists = sorted(dists)
c = Counter(dists)
print({k: c[k] for k in sorted_dists if k < 1.5})

close_ids = set()
for row_id, row in enumerate(dist_matrix):
    for col_id, cell in enumerate(row):
        if 0.0 < cell < 0.003:
            print(row_id, col_id, cell, cif_struct.sites[row_id].frac_coords, cif_struct.sites[col_id].frac_coords)

gives the following output:

{0.0: 112, 0.002091399999999175: 2, 0.00209140000000036: 2, 0.002091400000002075: 2, 0.002091400000002283: 2}

77 81 0.002091400000002283 [0.22443333 0.11216667 0.80486667] [0.22443333 0.11226667 0.80486667]
80 82 0.002091399999999175 [0.88783333 0.11226667 0.80486667] [0.88773333 0.11216667 0.80486667]
81 77 0.002091400000002283 [0.22443333 0.11226667 0.80486667] [0.22443333 0.11216667 0.80486667]
82 80 0.002091399999999175 [0.88773333 0.11216667 0.80486667] [0.88783333 0.11226667 0.80486667]
84 92 0.002091400000002075 [0.11226667 0.22443333 0.19513333] [0.11216667 0.22443333 0.19513333]
90 93 0.00209140000000036 [0.11216667 0.88773333 0.19513333] [0.11226667 0.88783333 0.19513333]
92 84 0.002091400000002075 [0.11216667 0.22443333 0.19513333] [0.11226667 0.22443333 0.19513333]
93 90 0.00209140000000036 [0.11226667 0.88783333 0.19513333] [0.11216667 0.88773333 0.19513333]

This can be circumvented by changing the site_tolerance parameter of CifParser to count two close sites as one. E.g., the following structure

cif_struct = CifParser("ATO.cif", site_tolerance=1e-3).parse_structures()[0]

will have the correct formula Si36 O72.

The equivalent functionality in ase seems to have a higher tolerance parameter (symprec) to count two sites as one (https://gitlab.com/ase/ase/-/blob/master/ase/spacegroup/spacegroup.py?ref_type=heads#L378), this is why the correct stochiometry is found with the default settings @wladerer .

kaueltzen avatar Aug 02 '24 12:08 kaueltzen

Thank you @kaueltzen !

Is this answer sufficient for you, @JPBergsma ? I would close the issue then.

JaGeo avatar Aug 02 '24 14:08 JaGeo

Thank you for looking into this issue more deeply on such short notice.

The main thing I am still wondering about is whether such a structure as I encountered here could be handled more elegantly in the future.

Assuming that the site occupancy is 1, there is a minimum distance two sites could have in any realistic structure. Let's assume that this distance is 0.5 angstrom. Then any inter site distance of less than 0.25 Angstrom probably means that two sites should overlap and only one atom should be added to the structure. I guess the best atomic position would be the average of the two positions in that case.

Do you think that this is reasonable and that this could be added to how structures are handled?

JPBergsma avatar Aug 02 '24 20:08 JPBergsma

@JPBergsma I am not sure. I personally wouldn't want any code to handle this automatically for me as I would fear unexpected side effects.

Are there Maintainer opinions? @shyuep @mkhorton @janosh ?

JaGeo avatar Aug 02 '24 20:08 JaGeo

See https://github.com/spglib/spglib/discussions/514 for the related discussion.

hongyi-zhao avatar Aug 08 '24 01:08 hongyi-zhao