aiida-quantumespresso
aiida-quantumespresso copied to clipboard
Parsing only the XML output (of PW)
This issue is open-ended and meant for tracking our discussion on the topic.
Now we parse both XML and textual output of PW, and then we merge the two sources of information. Textual output, being less structured, is harder to parse and to validate, so we would like to only use the XML output. However, removing the ability to parse text creates a backwards-incompatibility problem, because:
- versions between 6.0 and 6.1 (included) only enable the new XML format on request (when configured with
--enable-xml
or compiled-D__XSD
), otherwise they use the old XML format; - versions before 6.0 only output in the old XML format;
- in general, the new XML format/schema has improved with time, so older versions may have missing or incorrect information (which we must parse from the text).
One approach could be to start ignoring the text output from a certain version of QE onwards, and keep the rest of the code for backwards compatibility. After some time, old versions of QE can be deprecated and the text-parsing code removed completely.
One concern to keep in mind is our desire to keep consistent the results of the parser between different versions of the plugin; this may further constrain our choices.
@mbercx I think it is time to bite the bullet on this one. If the comment above is correct and all versions of QE v6.2 and above will always have the XML with the schema file, and the latest version of aiida-quantumespresso
only supports QE v6.6 and up, we can ditch parsing from the stdout altoghether, correct?
Possibly, but we have to check if all of the information parsed from the stdout
is currently in the XML. One example that came to mind immediately was the site charge and magnetization, which was only in the stdout
a while back. It seems to have been added now though.
An exercise I still wanted to do is make a list of outputs that are still parsed from the stdout, and see which we can parse from the XML instead. Then we can also request that these outputs are also added to the XML from the QE folks.
EDIT: When making such a list, we should also add the QE version that the XML output was added, to make sure we maintain compatibility as promised. I think it only makes sense to add XML parsing in case:
- All QE versions we support have the XML output.
- The stdout parsing is broken in a new QE version, but XML output is provided.
Just FYI, I have been working on this already and almost have something ready. Will open a PR soon and then we can discuss online. But have some other pressing stuff that I need to deal with first now.