pymatgen
pymatgen copied to clipboard
Bug in processing entries of GGA/GGA+U/R2SCAN scheme
I found that the entry mp-1181334-GGA+U for Cu-Fe-Li-O-Te chemical system can disappear and reappear in the processed entries result occasionally, when the GGA/GGA+U/R2SCAN scheme correction was conducted multiple times. The inconsistency issue happened about every third run in my test -- although the codes below been tested in the latest pymatgen package.
from pymatgen.ext.matproj import MPRester
from pymatgen.entries.mixing_scheme import MaterialsProjectDFTMixingScheme
with MPRester() as mpr:
entryList = mpr.get_entries_in_chemsys('Cu-Fe-Li-O-Te', additional_criteria={'thermo_types':['GGA_GGA+U', 'R2SCAN']})
entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, verbose=False)
print(len(entryList))
print('mp-1181334-GGA+U' in [e.entry_id for e in entryList])
There may be either 637 or 638 entries in the processed entryList. The latter case would contain an entry mp-1181334-GGA+U, which actually contains no energy adjustment related to the GGA/GGA+U/R2SCAN scheme, as shown in the exported JSON file below.
mp-1181334-GGA+U ComputedStructureEntry - Fe4 O13 (Fe4O13)
Energy (Uncorrected) = -92.6898 eV (-5.4523 eV/atom)
Correction = -11.1170 eV (-0.6539 eV/atom)
Energy (Final) = -103.8068 eV (-6.1063 eV/atom)
Energy Adjustments:
MP2020 anion correction (superoxide): -2.0930 eV (-0.1231 eV/atom)
MP2020 GGA/GGA+U mixing correction (Fe): -9.0240 eV (-0.5308 eV/atom)
Parameters:
potcar_spec = [{'titel': 'PAW_PBE Fe_pv 06Sep2000', 'hash': '5963411539032ec3298fa675a32c5e64'}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': '7a25bc5b9a5393f46600a4939d357982'}]
is_hubbard = True
hubbards = {'Fe': 5.3, 'O': 0.0}
run_type = GGA+U
Data:
oxide_type = superoxide
aspherical = True
last_updated = 2021-02-05 09:26:25.316000
task_id = mp-1829691
material_id = mp-1181334
oxidation_states = {}
run_type = GGA+U
The potential bug needs to be found, as the reliability of local processing algorithms has become critical for the generation of GGA/GGA+U/R2SCAN phase diagrams^note.
I can't repro this. Tried about 7 times. I get 638 entries every time.
which actually contains no energy adjustment related to the GGA/GGA+U/R2SCAN scheme
I'm not very familiar with our r2SCAN/GGA mixing scheme but I think what might be happening here is that we place r2SCAN energies onto the GGA(+U) hull? If so, you wouldn't expect an adjustment on a GGA+U entry? Pinging @rkingsbury @munrojm to correct me in case I'm wrong.
I'm not very familiar with our r2SCAN/GGA mixing scheme but I think what might be happening here is that we place r2SCAN energies onto the GGA(+U) hull? If so, you woudln't expect an adjustment on a GGA+U entry? Pinging @rkingsbury @munrojm to correct me in case I'm wrong here.
That's correct, some GGA/GGA+U entries will pass through the mixing scheme unmodified, if the hull is still being built with GGA/GGA+U entries. That would occur any time there are insufficient R2SCAN entries to build the hull. In such a high dimensional chemical system as this, I'm almost certain the hull would be built in GGA/GGA+U, so getting the entry back unmodified is exactly what should happen. Please refer to the mixing scheme publication for a more thorough explanation.
As for the non-reproducibility / I'm not sure. The mixing scheme does rely on structure matching, so perhaps due to some quirk in your computer system the matching occasionally fails to consider two structures "the same"? That's a stretch but it's the best thought I have.
Also, I note that you're using the old MPRester (pymatgen.ext.matproj). If you want to use r2SCAN data I think you would be better off using the rester from the mp-api package, but @munrojm would know better.
Also, I note that you're using the old MPRester (
pymatgen.ext.matproj). If you want to use r2SCAN data I think you would be better off using the rester from themp-apipackage, but @munrojm would know better.
@rkingsbury Oh, it is a typo here. Actually in my test codes, the MPRester method was imported by from mp_api.client import MPRester.
I tested it on a different computer and network, but the inconsistency issue still happened. I will try to run them with a fresh environment in a container next.
The same issue happened, even though I ran the codes in a fresh set-up container of python3.10-alpine. I think this is a rare case (the chemical system and entry) that induces this issue whereas would not happen in other chemical systems, but the potential bug may still result in entry loss or excess, thereby possibly changing the shape of the convex hull.
Can you pinpoint the bug?
@janosh Yes, it is the the structure_matcher.group_structures() methods in mixing_scheme.py 564L, that results in inconsistency when matching the structures of the entry mp-1181411-GGA+U Fe4O13 with the entry mp-1181334-GGA+U Fe4O13. I found that the codes sometimes match the two entries together in one group, whereas sometimes divide them into two individual groups. In the latter case, there would be 638 processed entries, otherwise, there would be 637 entries.
@janosh Yes, it is the the structure_matcher.group_structures() methods in mixing_scheme.py 564L, that results in inconsistency when matching the structures of the entry
mp-1181411-GGA+U Fe4O13with the entrymp-1181334-GGA+U Fe4O13. I found that the codes sometimes match the two entries together in one group, whereas sometimes divide them into two individual groups. In the latter case, there would be 638 processed entries, otherwise, there would be 637 entries.
@peikai it sounds to me like you are running into an edge case related to the numerical tolerances of StructureMatcher, where tiny numerical noise is causing those structures to sometimes match and sometimes not. I suggest 2 things:
- Can you reproduce the problem outside of the mixing scheme, by just running
StructureMatcheron your list of input structures? - If yes, then try adjusting the tolerance kwargs of
StructureMatcher. If you slightly change theltol,stol, orangle_tol, you can hopefully get reproducible results.