ShakeNBreak icon indicating copy to clipboard operation
ShakeNBreak copied to clipboard

Error in parsing espresso files (read_espresso_structure)

Open Walser52 opened this issue 6 months ago • 7 comments

I'm using quantum espresso with shakenbreak and there are one or two issues.

  1. The default espresso file that shakenbreak creates is an espresso relax calculation. HOWEVER, the read_espresso_structure method presumes a vc-relax file. That's because an ordinary relax calculation does not have the "CELL_PARAMETERS (angstrom)" string in its output that read_espresso_structure is looking for.

Solution: Simply make the default files that shakenbreak generates to be vc-relax rather than relax.

  1. The following line: atomic_positions = file_content.split("ATOMIC_POSITIONS (angstrom)")[1] should be followed up with: atomic_positions = atomic_positions.split("Writing output data")[0]

Otherwise the string doesn't parse correctly i.e. float conversion in the following line:

        coordinates = [[float(entry) for entry in line[1:4]] for line in atomic_positions_processed]

fails due to non-numeric input.

I'm using:

shakenbreak version 3.4.2. python 3.13

If the second error has not been removed then I can go ahead and make a pull request.

Walser52 avatar Jun 20 '25 04:06 Walser52

Hi @Walser52 , thanks for your interest in ShakeNBreak and for pointing out this issue!

The calculation should be kept to relax rather than vc-relax, since the cell shape and parameters should be kept fixed (e.g., we just optimise the ionic positions). Would you mind sharing your espresso output file, and I'll fix read_espresso_structure? (We don't typically use QE, just implemented its support for potential users)

Thanks!

Best,

Irea

ireaml avatar Jun 20 '25 10:06 ireaml

If it's supposed to be relax then we need to change the code.

The relax output does not write "CELL_PARAMETERS (angstrom)" between "Begin final coordinates" and "End final coordinates":

Begin final coordinates

ATOMIC_POSITIONS (crystal)
C             0.3333333330        0.6666666660        0.5000000000    0   0   0
C             0.6666666660        0.3333333330        0.4999978831
End final coordinates

The vc-relax output does:

Begin final coordinates
     new unit-cell volume =    532.19087 a.u.^3 (    78.86255 Ang^3 )
     density =      0.50580 g/cm^3

CELL_PARAMETERS (alat=  4.72431531)
   0.985563149  -0.000000000   0.000000000
  -0.492781575   0.853522724   0.000000000
   0.000000000   0.000000000   6.000000000

ATOMIC_POSITIONS (crystal)
C             0.3333333330        0.6666666660        0.5000000000
C             0.6666666660        0.3333333330        0.5000000000
End final coordinates

So the cell_lines in the code below from io.py need to come from elsewhere:

        if "Begin final coordinates" in file_content:
            file_content = file_content.split("Begin final coordinates")[-1]  # last geometry

        if "End final coordinates" in file_content:
            file_content = file_content.split("End final coordinates")[0]  # last geometry
        # Parse cell parameters and atomic positions

        cell_lines = [
            line
            for line in file_content.split("CELL_PARAMETERS (angstrom)")[1]
            .split("ATOMIC_POSITIONS (angstrom)")[0]
            .split("\n")
            if line != "" and line != " " and line != "   "
        ]

Note that these outputs are with verbosity='high' for QE.

Solution

relax output file does contain the cell parameters at the top of the file. We can extract them from there:

     crystal axes: (cart. coord. in units of alat)
               a(1) = (   1.000000   0.000000   0.000000 )  
               a(2) = (  -0.500000   0.866025   0.000000 )  
               a(3) = (   0.000000   0.000000   4.061244 )  

I'll fix it and get back to you.

Walser52 avatar Jun 20 '25 11:06 Walser52

Come to think of it there are other issues with this.

I am trying to develop both shakenbreak and doped for espresso. Is someone actively working on this that we can work with or is it on the back burner?

Walser52 avatar Jun 20 '25 11:06 Walser52

Thanks for raising this @Walser52! I see our espresso test file is indeed from a vc-relax calculation, but in most cases it should be a relax calculation – though ideally the code should be able to parse both without issue. As @ireaml said, we are not regular QE users and so we aimed to implement basic support for this software but do not have much experience.

Adding better support for Quantum Espresso (and other DFT packages) is something that has been on the wishlist but we haven't had any experienced users to push this. It would be great if you were able to develop code for this, and we'd be happy to help in any way we can!

kavanase avatar Jun 20 '25 17:06 kavanase

I see there is the pymatgen-io-espresso package which could be useful for streamlining this? Given that we mostly based off pymatgen/ase.

I think the QE parsing tools in ase have also improved somewhat since this code in ShakeNBreak was written, so would be worthwhile checking out what those methods can already achieve to avoid duplicating any efforts; https://gitlab.com/ase/ase/-/blob/master/ase/io/espresso.py

https://github.com/Griffin-Group/pymatgen-io-espresso

kavanase avatar Jun 20 '25 17:06 kavanase

Yes, that is the io package that I use (and plan to use) but I don't know what shortcomings it might have. They have an xml file parser which gives some measure of robustness rather than having to parse the output file string line by line.

The shakenbreak structure is fine and can be modified. The DefectsParser in doped is slightly more involved since it has been written with vasp in mind. Nevertheless I'm working on it. Let's see how far it goes.

Walser52 avatar Jun 20 '25 22:06 Walser52

Ok great, keep us in the loop! 😃

kavanase avatar Jun 20 '25 22:06 kavanase