Error in parsing espresso files (read_espresso_structure)
I'm using quantum espresso with shakenbreak and there are one or two issues.
- The default espresso file that
shakenbreakcreates is an espressorelaxcalculation. HOWEVER, theread_espresso_structuremethod presumes avc-relaxfile. That's because an ordinaryrelaxcalculation does not have the"CELL_PARAMETERS (angstrom)"string in its output thatread_espresso_structureis looking for.
Solution: Simply make the default files that shakenbreak generates to be vc-relax rather than relax.
- The following line:
atomic_positions = file_content.split("ATOMIC_POSITIONS (angstrom)")[1]should be followed up with:atomic_positions = atomic_positions.split("Writing output data")[0]
Otherwise the string doesn't parse correctly i.e. float conversion in the following line:
coordinates = [[float(entry) for entry in line[1:4]] for line in atomic_positions_processed]
fails due to non-numeric input.
I'm using:
shakenbreak version 3.4.2.
python 3.13
If the second error has not been removed then I can go ahead and make a pull request.
Hi @Walser52 , thanks for your interest in ShakeNBreak and for pointing out this issue!
The calculation should be kept to relax rather than vc-relax, since the cell shape and parameters should be kept fixed (e.g., we just optimise the ionic positions). Would you mind sharing your espresso output file, and I'll fix read_espresso_structure? (We don't typically use QE, just implemented its support for potential users)
Thanks!
Best,
Irea
If it's supposed to be relax then we need to change the code.
The relax output does not write "CELL_PARAMETERS (angstrom)" between "Begin final coordinates" and "End final coordinates":
Begin final coordinates
ATOMIC_POSITIONS (crystal)
C 0.3333333330 0.6666666660 0.5000000000 0 0 0
C 0.6666666660 0.3333333330 0.4999978831
End final coordinates
The vc-relax output does:
Begin final coordinates
new unit-cell volume = 532.19087 a.u.^3 ( 78.86255 Ang^3 )
density = 0.50580 g/cm^3
CELL_PARAMETERS (alat= 4.72431531)
0.985563149 -0.000000000 0.000000000
-0.492781575 0.853522724 0.000000000
0.000000000 0.000000000 6.000000000
ATOMIC_POSITIONS (crystal)
C 0.3333333330 0.6666666660 0.5000000000
C 0.6666666660 0.3333333330 0.5000000000
End final coordinates
So the cell_lines in the code below from io.py need to come from elsewhere:
if "Begin final coordinates" in file_content:
file_content = file_content.split("Begin final coordinates")[-1] # last geometry
if "End final coordinates" in file_content:
file_content = file_content.split("End final coordinates")[0] # last geometry
# Parse cell parameters and atomic positions
cell_lines = [
line
for line in file_content.split("CELL_PARAMETERS (angstrom)")[1]
.split("ATOMIC_POSITIONS (angstrom)")[0]
.split("\n")
if line != "" and line != " " and line != " "
]
Note that these outputs are with verbosity='high' for QE.
Solution
relax output file does contain the cell parameters at the top of the file. We can extract them from there:
crystal axes: (cart. coord. in units of alat)
a(1) = ( 1.000000 0.000000 0.000000 )
a(2) = ( -0.500000 0.866025 0.000000 )
a(3) = ( 0.000000 0.000000 4.061244 )
I'll fix it and get back to you.
Come to think of it there are other issues with this.
I am trying to develop both shakenbreak and doped for espresso. Is someone actively working on this that we can work with or is it on the back burner?
Thanks for raising this @Walser52! I see our espresso test file is indeed from a vc-relax calculation, but in most cases it should be a relax calculation – though ideally the code should be able to parse both without issue.
As @ireaml said, we are not regular QE users and so we aimed to implement basic support for this software but do not have much experience.
Adding better support for Quantum Espresso (and other DFT packages) is something that has been on the wishlist but we haven't had any experienced users to push this. It would be great if you were able to develop code for this, and we'd be happy to help in any way we can!
I see there is the pymatgen-io-espresso package which could be useful for streamlining this? Given that we mostly based off pymatgen/ase.
I think the QE parsing tools in ase have also improved somewhat since this code in ShakeNBreak was written, so would be worthwhile checking out what those methods can already achieve to avoid duplicating any efforts; https://gitlab.com/ase/ase/-/blob/master/ase/io/espresso.py
https://github.com/Griffin-Group/pymatgen-io-espresso
Yes, that is the io package that I use (and plan to use) but I don't know what shortcomings it might have. They have an xml file parser which gives some measure of robustness rather than having to parse the output file string line by line.
The shakenbreak structure is fine and can be modified. The DefectsParser in doped is slightly more involved since it has been written with vasp in mind. Nevertheless I'm working on it. Let's see how far it goes.
Ok great, keep us in the loop! 😃