pymatgen icon indicating copy to clipboard operation
pymatgen copied to clipboard

Atom labels in CIF file are silently rewritten by CifWriter

Open fxcoudert opened this issue 1 year ago • 6 comments

Python version

Python 3.11.6

Pymatgen version

2024.4.13

Operating system version

macOS 14.4.1

Current behavior

This is related to https://github.com/materialsproject/pymatgen/issues/3761 but different. I have upgraded my pymatgen from 2023.10.4 and 2024.4.13 and I have workflows that fail as a result of the update. This is because CifWriter now silently replaces atom labels, even when they were unique! This seems very unnatural (and makes my current code fail, because I am writing bond specifications for the labels in the structure, and they don't match in the CIF file).

with MPRester("c3OruwLchURd4NLeENE40ziu8cNOGgyx") as m:
    structure = m.get_structure_by_material_id("mp-1234")
    print(structure.labels)
    make_labels_unique(structure)
    print(structure.labels)
    print(str(CifWriter(structure)))
  • structure.labels is originally ['Lu', 'Lu', 'Al', 'Al', 'Al', 'Al']
  • after the make_labels_unique call (my function, code below), they are ['Lu0', 'Lu1', 'Al0', 'Al1', 'Al2', 'Al3']
  • but in the CIF file, alas:
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Lu  Lu0  1  0.87500000  0.87500000  0.87500000  1
  Lu  Lu1  1  0.12500000  0.12500000  0.12500000  1
  Al  Al2  1  0.50000000  0.50000000  0.50000000  1
  Al  Al3  1  0.50000000  0.50000000  0.00000000  1
  Al  Al4  1  0.00000000  0.50000000  0.50000000  1
  Al  Al5  1  0.50000000  -0.00000000  0.50000000  1

Al atoms have been renamed from Al0..Al3 to Al2..Al5

Expected Behavior

When labels are conformant to the CIF format (which they are in this case) they should not be altered.

Minimal example

See code above. The function make_labels_unique is:

def make_labels_unique(struct):
    from collections import Counter
    
    labels = [site.label for site in struct.sites]
    if len(labels) == len(set(labels)):
        # All labels are unique, nothing to do
        return

    labels = Counter(labels)
    counter = {}
    for i, site in enumerate(struct.sites):
        label = site.label
        if labels[label] > 1:
            c = counter.get(label, 0)
            site.label = f"{label}{c}" if label.isalpha() else f"{label}_{c}"
            c = c + 1
            counter[label] = c

fxcoudert avatar Apr 19 '24 12:04 fxcoudert

This is because if "magmom" in site.properties is true for this structure at https://github.com/materialsproject/pymatgen/blame/2d008e0dd5c430692e8dcac2505340a6bdff1642/pymatgen/io/cif.py#L1516C21-L1516C51

But I am not writing magnetic moments, and I do not know why that should override labels.

fxcoudert avatar Apr 19 '24 12:04 fxcoudert

@fxcoudert thanks for reporting. I assume a pull-request to fix this issue would be very welcome.

JaGeo avatar Apr 19 '24 12:04 JaGeo

I confirm that removing the magnetic moments with structure.remove_site_property("magmom") does fix the issue.

Regarding a PR, my own understanding of what the code tries to do for magnetic moments is insufficient to handle it well. I wouldn't want to break another use case…

fxcoudert avatar Apr 19 '24 12:04 fxcoudert

I confirm that removing the magnetic moments with structure.remove_site_property("magmom") does fix the issue.

Regarding a PR, my own understanding of what the code tries to do for magnetic moments is insufficient to handle it well. I wouldn't want to break another use case…

I see! I also don't know that functionality. Should I leave it open to see if someone else can fix it?

JaGeo avatar Apr 19 '24 12:04 JaGeo

@mkhorton , maybe you now more about this.

JaGeo avatar Apr 19 '24 13:04 JaGeo

This also struck me as odd. I'm currently working on #3767 which touches this bit of the code and happy to fix this if someone tells me the expected behaviour for magnetic moments.

stefsmeets avatar Apr 22 '24 12:04 stefsmeets