pymatgen
                                
                                 pymatgen copied to clipboard
                                
                                    pymatgen copied to clipboard
                            
                            
                            
                        Atom labels in CIF file are silently rewritten by CifWriter
Python version
Python 3.11.6
Pymatgen version
2024.4.13
Operating system version
macOS 14.4.1
Current behavior
This is related to https://github.com/materialsproject/pymatgen/issues/3761 but different. I have upgraded my pymatgen from 2023.10.4 and 2024.4.13 and I have workflows that fail as a result of the update. This is because CifWriter now silently replaces atom labels, even when they were unique! This seems very unnatural (and makes my current code fail, because I am writing bond specifications for the labels in the structure, and they don't match in the CIF file).
with MPRester("c3OruwLchURd4NLeENE40ziu8cNOGgyx") as m:
    structure = m.get_structure_by_material_id("mp-1234")
    print(structure.labels)
    make_labels_unique(structure)
    print(structure.labels)
    print(str(CifWriter(structure)))
- structure.labelsis originally- ['Lu', 'Lu', 'Al', 'Al', 'Al', 'Al']
- after the make_labels_uniquecall (my function, code below), they are['Lu0', 'Lu1', 'Al0', 'Al1', 'Al2', 'Al3']
- but in the CIF file, alas:
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Lu  Lu0  1  0.87500000  0.87500000  0.87500000  1
  Lu  Lu1  1  0.12500000  0.12500000  0.12500000  1
  Al  Al2  1  0.50000000  0.50000000  0.50000000  1
  Al  Al3  1  0.50000000  0.50000000  0.00000000  1
  Al  Al4  1  0.00000000  0.50000000  0.50000000  1
  Al  Al5  1  0.50000000  -0.00000000  0.50000000  1
Al atoms have been renamed from Al0..Al3 to Al2..Al5
Expected Behavior
When labels are conformant to the CIF format (which they are in this case) they should not be altered.
Minimal example
See code above. The function make_labels_unique is:
def make_labels_unique(struct):
    from collections import Counter
    
    labels = [site.label for site in struct.sites]
    if len(labels) == len(set(labels)):
        # All labels are unique, nothing to do
        return
    labels = Counter(labels)
    counter = {}
    for i, site in enumerate(struct.sites):
        label = site.label
        if labels[label] > 1:
            c = counter.get(label, 0)
            site.label = f"{label}{c}" if label.isalpha() else f"{label}_{c}"
            c = c + 1
            counter[label] = c
This is because if "magmom" in site.properties is true for this structure at https://github.com/materialsproject/pymatgen/blame/2d008e0dd5c430692e8dcac2505340a6bdff1642/pymatgen/io/cif.py#L1516C21-L1516C51
But I am not writing magnetic moments, and I do not know why that should override labels.
@fxcoudert thanks for reporting. I assume a pull-request to fix this issue would be very welcome.
I confirm that removing the magnetic moments with structure.remove_site_property("magmom") does fix the issue.
Regarding a PR, my own understanding of what the code tries to do for magnetic moments is insufficient to handle it well. I wouldn't want to break another use case…
I confirm that removing the magnetic moments with
structure.remove_site_property("magmom")does fix the issue.Regarding a PR, my own understanding of what the code tries to do for magnetic moments is insufficient to handle it well. I wouldn't want to break another use case…
I see! I also don't know that functionality. Should I leave it open to see if someone else can fix it?
@mkhorton , maybe you now more about this.
This also struck me as odd. I'm currently working on #3767 which touches this bit of the code and happy to fix this if someone tells me the expected behaviour for magnetic moments.