Enhancement: Quantum Espresso Extension
I think I have a working framework for QE, at least in so far as point defects are concerned. And can go ahead make a fork and pull. But the code needs cleaning so I wanted to know in advance what the preferred structure would be.
Currently we have a DefectsParser class that calls DefectParser. For developing QE, I made separate DefectsParserEsp and DefectParserEsp which inherit from the corresponding classes.
How should various codes be called under the hood so to speak? We could use abstractmethod for instance to define a base class and have every subsequent extension follow that blueprint.
Additionally, we need some nomenclature for the new classes.
Most of DefectsParser is now agnostic to the underlying DFT code used (beyond some references to vr for vasprun which can be modified), and the same logic for managing parsing, using multiprocessing and then running checks on the parsed defect entries should be applied for QE; with the main difference being the per-defect DefectParser that is run in the multiprocessing setup.
So I guess ideally we would keep DefectsParser as the main interface for parsing, which can then automatically detect what DFT code it is parsing based on filenames, and then have separate DefectParsers for VASP and QE? Using abstractmethod as you say to define a base DefectParser class which includes the methods which are code-agnostic (such as structure parsing and defect (entry) initialisation), and then e.g. VaspDefectParser and EspressoDefectParser as sub-classes?
Yes, I think that's good. As for vasprun.xml, I suggest we change it to match xml extension instead, because that would cover all codes.
The current Espresso implementation is contingent upon their already being a suitable Vasprun implementation. Once that is available, any code can be integrated.
Ok! I don't really know what you mean about the xml point -- the xml parsing is still specific to VASP as it expects the VASP xml format. But other than that, sounds good.
The filename vasprun.xml is hard coded at places. QE also generates xml files as do other codes. Is there a reason for matching looking for the entire vasprun.xml filename or can we be content with just matching any xml file? Just a procedural thing.
In those cases hard-matching vasprun.xml is mostly intentional as the subsequent code assumes VASP format. The only places where this should really differ I think will be in the code that detects calculation folders to parse (in DefectsParser and in CompetingPhasesAnalyzer). For that, we'll likely want to use a set of patterns which match the outputs of supported codes (i.e. vasprun.xml for VASP, and whatever the main QE output file is). If we add support for other codes in the future, they may not have xml as an extension (e.g. CP2k, CASTEP etc), so best to avoid just using it (and to avoid matching other xml files which aren't supported outputs) I think?
Oh, okay. To maintain logical consistency across classes, I was using espresso.xml as my default filename even though that's not the default that QE uses and users specify their own prefix anyways. Let's see how it goes.
Ok yes I see. In that case then, we may want to first detect folders with .xml files, and the quickly check which of these start with a recognised XML header/tag from QE's output, to determine which folders are considered QE calculation folders for parsing.
So I think the QE DefectsParser is ready. I need to cross-check the eFNV corrections because unlike VASP, QE doesn't write them by default.
We can merge the pull request over the next week or two.
Ok great! One you have a PR draft I can start to review :)