qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

pw2qmcpack: wrong orbital norm due to use of QE bandgroups

Open mcbennet opened this issue 4 years ago • 6 comments

I am running into an issue with pw2qmcpack. I attempted to write orbitals from a hybrid calculation. pw2qmcpack prints:

     Parallelization info
     --------------------
     sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
     Min        2529    2529    668               185725   185725   25283
     Max        2530    2530    669               185726   185726   25285
     Sum       40477   40477  10699              2971607  2971607  404541

     Generating pointlists ...
     new r_m :   0.2884 (alat units)  2.3621 (a.u.) for type    1
     new r_m :   0.2824 (alat units)  2.3127 (a.u.) for type    2
     new r_m :   0.2824 (alat units)  2.3127 (a.u.) for type    3

     negative rho (up, down):  0.000E+00 5.934E-06
The wrong norm of k-point  15 band   1 , after collection before writing, is   10.000000000000030
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point   5 band   1 , after collection before writing, is   10.000000000000050
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point   7 band   1 , after collection before writing, is   10.000000000000089
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point  11 band   1 , after collection before writing, is   10.000000000000091
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point  21 band   1 , after collection before writing, is    9.999999999999972
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point   9 band   1 , after collection before writing, is   10.000000000000002
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point  19 band   1 , after collection before writing, is   10.000000000000012
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point  13 band   1 , after collection before writing, is    9.999999999999947
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point   3 band   1 , after collection before writing, is    9.999999999999899
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
The wrong norm of k-point  17 band   1 , after collection before writing, is    9.999999999999957
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
esh5 create pwscf_output/pwscf.pwscf.h5_part9
esh5 create pwscf_output/pwscf.pwscf.h5_part1
esh5 create pwscf_output/pwscf.pwscf.h5_part5
esh5 create pwscf_output/pwscf.pwscf.h5_part7
esh5 create pwscf_output/pwscf.pwscf.h5_part2
esh5 create pwscf_output/pwscf.pwscf.h5_part6
esh5 create pwscf_output/pwscf.pwscf.h5_part10
esh5 create pwscf_output/pwscf.pwscf.h5_part8
esh5 create pwscf_output/pwscf.pwscf.h5_part4
esh5 create pwscf_output/pwscf.pwscf.h5_part3
The wrong norm of k-point   1 band   1 , after collection before writing, is   10.000000000000075
The orbitals went wrong before being written to h5 file. Please first add debug=.true. in the pw2qmcpack input file to check if the orbitals can be read from QE files correctly.
esh5 create pwscf_output/pwscf.pwscf.h5

I found it interesting that the norm is integer valued, but not 1. This is related to a previous issue, #2538 , however in my case the pseudos should be norm conserving -- and indeed, DFT+U orbitals can be generated using these same pseudos (though DFT+U conversions were run on a different machine but with same version of QE).

I have submitted a run with debug=.true. to the queue, but thought I would submit the issue in the interim.

Relevant files are here: p2q_run.zip

To Reproduce Steps to reproduce the behavior: I am running QE 6.4.1 on NERSC Cori. Patched and built by hand.

System:

  • NERSC Cori

mcbennet avatar Apr 13 '21 13:04 mcbennet

As a follow-up, I ran with band parallelization (10 groups) for the scf to speed up the EXX calculation. I used an identical parallelization scheme for pw2qmcpack. I assume this is relevant to the issue here. If so, would it be worthwhile to add a catch when band parallelization is requested or to just ignore it all together (with a printout to the user that the flag will be ignored)?

mcbennet avatar Apr 13 '21 13:04 mcbennet

The bandgroup was probably the reason. Could you try to run the pw2qmcpack without band groups settings? also 1/10 of nodes. Ye

Ye Luo, Ph.D. Computational Science Division & Leadership Computing Facility Argonne National Laboratory

On Tue, Apr 13, 2021 at 8:47 AM M. Chandler Bennett < @.***> wrote:

As a follow-up, I ran with band parallelization (10 groups) for the scf to speed up the EXX calculation. I used an identical parallelization scheme for pw2qmcpack. I assume this is relevant to the issue here. If so, would it be worthwhile to add a catch when band parallelization is requested or to just ignore it all together (with a printout to the user that the flag will be ignored)?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/issues/3095#issuecomment-818749840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDBKYBTTOMD4ENEP7X5ODTIRDQFANCNFSM423K3U2A .

ye-luo avatar Apr 13 '21 14:04 ye-luo

Removing bandgroups and running with 1/10 nodes solves the problem and leads to a clean run.

mcbennet avatar Apr 13 '21 14:04 mcbennet

I think we need to add a fix. QE bandgroup has some dataset replicated. Option 1 support running with bandgroup, divide the orbital values by the number of bandgroups. Can be tricky to find all the places needing divides. Also susceptible to bandgroup handling changes. Also waste compute resource running converter this way. Option 2 error out and ask user to remove bandgroup when running pw2qmcpack. I prefer 2 at the moment.

ye-luo avatar Apr 13 '21 14:04 ye-luo

Following the principle of least surprise to the user, option 2 is clearly required. Provided QE can still be run with bandgroups then this doesn't seem like much of an inconvenience. Option 1 would be desirable longer term if the QE implementation is stable enough.

prckent avatar Apr 13 '21 15:04 prckent

Brief follow-up.

After converting the orbitals with pw2qmcpack as described above (removing bandgroups and running with 1/10 nodes), I obtained a high variance from zeroed Jastrow VMC (|v/e| ~ 0.19). A subsequent Jastrow optimization performed very poorly.

As a cross-check, I then converted the orbitals with convertpw4qmc. In this case, the zeroed Jastrow shows |v/e| ~ 0.07.

Therefore, in the case of bandgroups, it seems pw2qmcpack is not getting the correct orbitals from QE.

mcbennet avatar Apr 21 '21 13:04 mcbennet