pymatgen
pymatgen copied to clipboard
Round trip potcar writing/reading
Describe the bug Round trip Potcar parsing results in a warning that's not present when writing input.
To Reproduce
import unittest
from pymatgen.io.vasp.sets import MPRelaxSet
from pymatgen.io.vasp.inputs import VaspInput
from pymatgen.core.lattice import Lattice
from pymatgen.core.structure import Structure
from pymatgen.core.periodic_table import Element
from monty.tempfile import ScratchDir
class PotcarCompatTest(unittest.TestCase):
def test_round_trip(self):
with ScratchDir('.'):
structure = Structure(Lattice.cubic(1.5), ['C'], [[0, 0, 0,]])
relax_set = MPRelaxSet(structure)
# This should complain if md5 hashes are wrong
print("Write set")
relax_set.write_input('.')
print("Read")
vasp_input = VaspInput.from_directory('.') # This results in a warning
if __name__ == "__main__":
unittest.main()
Expected behavior No warning if input set is written without a warning.
Desktop (please complete the following information):
- OS: Linux
Additional context Believe this is related to the regex parsing of PotcarSingle from a multiple potcar file, or perhaps on the write itself, which may leave out a trailing line break present in the original file.
On the write - the PotcarData contains a trailing line break, but not on the read.
test_potcar.py Write set
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /root/miniconda3/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py(1780)__init__()
-> self.hash = self.get_potcar_hash()
(Pdb) self.data[-20:]
'+01\n End of Dataset\n'
(Pdb) c
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB continue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Read
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /root/miniconda3/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py(1780)__init__()
-> self.hash = self.get_potcar_hash()
(Pdb) self.data[-20:]
'E+01\n End of Dataset'
(Pdb)
I think the reason for this is simply that the hash database is incomplete.
There are two sets of hashes, a "file" hash (checks the POTCAR is byte-to-byte identical) and a "data" hash (which is a bit more permissive, e.g. if there are changes in comment strings etc.)
The data hash matches, but the file match does not, hence it is in error. The warning message should probably be less extreme, i.e. acknowledge the possibility that the pymatgen database is incomplete.
The warning does not appear on set construction since only the "data" hash is checked.
Suggested remedy here is:
- Only check the data hash in both places to avoid inconsistency.
- Supplement the hash file with additional hashes.
A separate issue here is that PotcarSingle.identify_potcar() returns (['PBE'], ['runelements', 'C']) for reasons that are not clear to me.
Hm, I think the runelements is simply because there was an older VASP POTCAR that identified itself as such (or perhaps a parsing bug).
Ok, I mis-spoke previously. I think this is a round-trip issue due to some logic in Potcar.from_file. The file itself is correct.
I have a fix which is to hash the str(PotcarSingle) rather than PotcarSingle.data. The failing file hash is because PotcarSingle.data removes a trailing newline.