pw2qmcpack: wrong orbital norm due to use of QE bandgroups
I am running into an issue with pw2qmcpack. I attempted to write orbitals from a hybrid calculation. pw2qmcpack prints:
Parallelization info
--------------------
sticks: dense smooth PW G-vecs: dense smooth PW
Min 2529 2529 668 185725 185725 25283
Max 2530 2530 669 185726 185726 25285
Sum 40477 40477 10699 2971607 2971607 404541
Generating pointlists ...
new r_m : 0.2884 (alat units) 2.3621 (a.u.) for type 1
new r_m : 0.2824 (alat units) 2.3127 (a.u.) for type 2
new r_m : 0.2824 (alat units) 2.3127 (a.u.) for type 3
negative rho (up, down): 0.000E+00 5.934E-06
The wrong norm of k-point 15 band 1 , after collection before writing, is 10.000000000000030
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 5 band 1 , after collection before writing, is 10.000000000000050
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 7 band 1 , after collection before writing, is 10.000000000000089
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 11 band 1 , after collection before writing, is 10.000000000000091
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 21 band 1 , after collection before writing, is 9.999999999999972
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 9 band 1 , after collection before writing, is 10.000000000000002
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 19 band 1 , after collection before writing, is 10.000000000000012
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 13 band 1 , after collection before writing, is 9.999999999999947
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 3 band 1 , after collection before writing, is 9.999999999999899
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point 17 band 1 , after collection before writing, is 9.999999999999957
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
esh5 create pwscf_output/pwscf.pwscf.h5_part9
esh5 create pwscf_output/pwscf.pwscf.h5_part1
esh5 create pwscf_output/pwscf.pwscf.h5_part5
esh5 create pwscf_output/pwscf.pwscf.h5_part7
esh5 create pwscf_output/pwscf.pwscf.h5_part2
esh5 create pwscf_output/pwscf.pwscf.h5_part6
esh5 create pwscf_output/pwscf.pwscf.h5_part10
esh5 create pwscf_output/pwscf.pwscf.h5_part8
esh5 create pwscf_output/pwscf.pwscf.h5_part4
esh5 create pwscf_output/pwscf.pwscf.h5_part3
The wrong norm of k-point 1 band 1 , after collection before writing, is 10.000000000000075
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
esh5 create pwscf_output/pwscf.pwscf.h5
I found it interesting that the norm is integer valued, but not 1. This is related to a previous issue, #2538 , however in my case the pseudos should be norm conserving -- and indeed, DFT+U orbitals can be generated using these same pseudos (though DFT+U conversions were run on a different machine but with same version of QE).
I have submitted a run with debug=.true. to the queue, but thought I would submit the issue in the interim.
Relevant files are here: p2q_run.zip
To Reproduce Steps to reproduce the behavior: I am running QE 6.4.1 on NERSC Cori. Patched and built by hand.
System:
- NERSC Cori
As a follow-up, I ran with band parallelization (10 groups) for the scf to speed up the EXX calculation. I used an identical parallelization scheme for pw2qmcpack. I assume this is relevant to the issue here. If so, would it be worthwhile to add a catch when band parallelization is requested or to just ignore it all together (with a printout to the user that the flag will be ignored)?
The bandgroup was probably the reason. Could you try to run the pw2qmcpack without band groups settings? also 1/10 of nodes. Ye
Ye Luo, Ph.D. Computational Science Division & Leadership Computing Facility Argonne National Laboratory
On Tue, Apr 13, 2021 at 8:47 AM M. Chandler Bennett < @.***> wrote:
As a follow-up, I ran with band parallelization (10 groups) for the scf to speed up the EXX calculation. I used an identical parallelization scheme for pw2qmcpack. I assume this is relevant to the issue here. If so, would it be worthwhile to add a catch when band parallelization is requested or to just ignore it all together (with a printout to the user that the flag will be ignored)?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/issues/3095#issuecomment-818749840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDBKYBTTOMD4ENEP7X5ODTIRDQFANCNFSM423K3U2A .
Removing bandgroups and running with 1/10 nodes solves the problem and leads to a clean run.
I think we need to add a fix. QE bandgroup has some dataset replicated. Option 1 support running with bandgroup, divide the orbital values by the number of bandgroups. Can be tricky to find all the places needing divides. Also susceptible to bandgroup handling changes. Also waste compute resource running converter this way. Option 2 error out and ask user to remove bandgroup when running pw2qmcpack. I prefer 2 at the moment.
Following the principle of least surprise to the user, option 2 is clearly required. Provided QE can still be run with bandgroups then this doesn't seem like much of an inconvenience. Option 1 would be desirable longer term if the QE implementation is stable enough.
Brief follow-up.
After converting the orbitals with pw2qmcpack as described above (removing bandgroups and running with 1/10 nodes), I obtained a high variance from zeroed Jastrow VMC (|v/e| ~ 0.19). A subsequent Jastrow optimization performed very poorly.
As a cross-check, I then converted the orbitals with convertpw4qmc. In this case, the zeroed Jastrow shows |v/e| ~ 0.07.
Therefore, in the case of bandgroups, it seems pw2qmcpack is not getting the correct orbitals from QE.