aiida-quantumespresso Add `OpengridCalculation` for `open

QE has an open_grid.x code to unfold an irreducible-Brillouin-zone calculation to a full kmesh, significantly reduces the computation if the following calculation requires a full kmesh. This PR adds a OpengridCalculation class to support the code.

To allow subsequent ProjwfcCalculation on top of the OpengridCalculation, I have to introduce a somewhat ugly input/output spec:

a structure input which isn't used by the OpengridCalculation, but because ProjwfcCalculation implicitly requires the parent calculation having an input structure (so it can parse the atomic orbitals).
I modified the ProjwfcParser to allow it to use output kpoints if no input kpoints in the parent calculation (projwfc should use the output kpoints which is the full kmesh).

Aug 05 '21 16:08 qiaojunfeng

Thanks @yunfeng. Before I review this, a quick couple of questions:

Where exactly is the computation time won? If you pass an IBZ kpoints to a calculation, will QE try to do the unfolding itself but in a less effective way compared to open_grid.x? Surely calculating a full mesh is more expensive compared to a reduced mesh?
The issue you describe of the ProjwfcParser already cropped up a long time before and is documented in #299 . Ideally, we should really just fix this, and then you also don't have to work around this with weird specs etc.

Aug 06 '21 07:08 sphuber

Thanks @sphuber,

Thanks @Yunfeng. Before I review this, a quick couple of questions:

Where exactly is the computation time won? If you pass an IBZ kpoints to a calculation, will QE try to do the unfolding itself but in a less effective way compared to open_grid.x? Surely calculating a full mesh is more expensive compared to a reduced mesh?

In essence, the open_grid.x just read the wavefunctions of IBZ kpoints, apply some symmetry operations to generate wavefunction at the remaining kpoints. For pw.x, we can do a nscf calculation on full kmesh with restarting from IBZ calculation, but that requires diagonalization of the Kohn Sham equations, and orthogonalization of KS orbitals, this is much slower than reading and rewriting wavefunctions. This difference is something like 30 mins v.s. 3 mins.

The issue you describe of the ProjwfcParser already cropped up a long time before and is documented in ProjwfcParser uses input nodes up to 3 links back, and prevents easy testing #299 . Ideally, we should really just fix this, and then you also don't have to work around this with weird specs etc.

Yes I agree we should fix the ProjwfcParser, the question is how to fix it: the ProjwfcParser requires a kpoints, a structure, and a output_parameters containing number_of_spin_components, https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/parsers/projwfc.py#L404 In this opengrid case, the ProjwfcParser should use the opengrid output kpoints, since it is different from its input kpoints, so we can add an optional input kpoints to ProjwfcCalculation. Similarly, we can add an optional input structure for ProjwfcCalculation. But for number_of_spin_components, adding an optional input parameters containing number_of_spin_components to ProjwfcCalculation, is a bit weird. Also the projwfc.x raw output does not contain spin info. Or, in a totally different way, the XML file in the parent_folder contains the needed info to generate kpoints, structure, and number_of_spin_components, should we parse the XML in ProjwfcParser?

Aug 06 '21 08:08 qiaojunfeng

In essence, the open_grid.x just read the wavefunctions of IBZ kpoints, apply some symmetry operations to generate wavefunction at the remaining kpoints. For pw.x, we can do a nscf calculation on full kmesh with restarting from IBZ calculation, but that requires diagonalization of the Kohn Sham equations, and orthogonalization of KS orbitals, this is much slower than reading and rewriting wavefunctions. This difference is something like 30 mins v.s. 3 mins.

Thanks a lot. So just to see if I get the use case here: after an NSCF performed on a set of kpoints in the IBZ, instead of doing a new NSCF restart on the full mesh before passing it to projwfc.x, we first go through open_grid.x (instead of pw.x) and then directly go to projwfc.x?

Yes I agree we should fix the ProjwfcParser, the question is how to fix it: the ProjwfcParser requires a kpoints, a structure, and a output_parameters containing number_of_spin_components,

https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/parsers/projwfc.py#L404

In this opengrid case, the ProjwfcParser should use the opengrid output kpoints, since it is different from its input kpoints, so we can add an optional input kpoints to ProjwfcCalculation. Similarly, we can add an optional input structure for ProjwfcCalculation.

I am not sure that adding inputs to ProjwfcCalculation is the way to go. The whole problem with the ProjwfcParser that we have now is that it is not self-contained and reaches up the provenance graph to get information from preceding calculations. Passing in an output of the parent calculation as an input, is just another version of this same workaround.

But for number_of_spin_components, adding an optional input parameters containing number_of_spin_components to ProjwfcCalculation, is a bit weird. Also the projwfc.x raw output does not contain spin info. Or, in a totally different way, the XML file in the parent_folder contains the needed info to generate kpoints, structure, and number_of_spin_components, should we parse the XML in ProjwfcParser?

If the output of projwfc.x itself does not contain the necessary information, than we should indeed get it from the XML of the parent folder because this is exactly what projwfc.x bases itself on. This would also solve the problem of using a parent_folder that may come from different calcjobs. In this way, it should not matter for the ProjwfcParser and simply always parses the same XML. That is to say that QE doesn't do something incredibly weird and annoying and have pw.x and open_grid.x write different output formats... :sweat_smile:

Aug 06 '21 09:08 sphuber

Thanks @mbercx for fixing #722, #741, and #747! Now I have removed those strange input spec from OpengridCalculation, and added some tests. Finally, the whole Wannier workchain runs smoothly 😄

Oct 05 '21 19:10 qiaojunfeng

Thanks @qiaojunfeng . I propose we wait for @mbercx 's PRs to be finished and merged and then we rebase your PR. Then I will give it a final pass and we can merge it.

Oct 06 '21 11:10 sphuber

All right, just did some checks, including

launch OpenGridCalculation on top of the RemoteData output folder of a scf PwCalculation
launch ProjwfcCalculation on top of the RemoteData output folder of the OpenGridCalculation

and all things work well.

One noticeable change: I use OpenGridCalculation instead of OpengridCalculation, and quantumespresso.open_grid instead of quantumespresso.opengrid for the entry point name, to be more consistent with the QE executable name open_grid.x, as @sphuber suggested one year ago. Although the capital case and the underscore look a bit unwieldy to me, I can tolerate this for the sake of consistency.

For the compatibility matrix, maybe we can add a footnote to the QE version, saying for open_grid.x the user needs to run QE>=7.0 instead of raising the lower bound of QE version?

Nov 23 '22 10:11 qiaojunfeng

Sure looks good to me! Thanks a lot to both. Just make sure to squash merge and add a nice commit message as per yuzh

Feb 08 '23 09:02 sphuber

Add `OpengridCalculation` for `open_grid.x` code