qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

Nexus: support supercell twists in PySCF workflows

Open jtkrogel opened this issue 1 year ago • 5 comments

Proposed changes

This PR adds support for arbitrary supercell twists/twist grids in workflows involving PySCF and QMCPACK.

This PR is now ready for review.

What type(s) of changes does this code introduce?

  • New feature

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Laptop, Improv at ALCF

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'

Path out of WIP

  • [x] Supercell twist to primitive cell k-point mappings in generated PySCF inputs
  • [x] Loop-free twist averaging in subsequent QMCPACK runs
  • [x] Add a twist-averaging VMC examples for diamond
  • [x] Fix pyscf tests

jtkrogel avatar Jul 01 '24 20:07 jtkrogel

Ready to proceed.

jtkrogel avatar Jul 02 '24 19:07 jtkrogel

could you suggest a reviewer?

ye-luo avatar Jul 02 '24 19:07 ye-luo

Yes, already flagged @anbenali for this.

jtkrogel avatar Jul 02 '24 20:07 jtkrogel

Yes, already flagged @anbenali for this.

Oh thanks. I missed it.

ye-luo avatar Jul 02 '24 20:07 ye-luo

c4q does not complete successfully for me in both example cases. Additionally, Nexus doesn't notice the error and keeps checking status, this time overnight. (I have seen this on other occasions, so my guess is that there is some changed handling or error messages or signals that are not being caught by the workstation/"wsNN" infrastructure, perhaps with openmpi runs). These were run on nitrogen2 with the nightly test configuration for gcc "new"+openmpi. i.e. reasonably new versions of all software including python (3.11.9) installed via spack. Note the broken scf.h5 link. PySCF is 2.5.0 in this case. Happy to poke further -- this could well be completely unrelated to Nexus and something to do with the converter or a PySCF version dependency etc.

nohup: ignoring input

_____________________________________________________

                     Nexus 2.1.0

        (c) Copyright 2012-  Nexus developers

                     Please cite:
  J. T. Krogel Comput. Phys. Commun. 198 154 (2016)
     https://doi.org/10.1016/j.cpc.2015.08.012
_____________________________________________________
          
            

Checking for Nexus dependencies on the current machine...

  Nexus dependencies available on current machine:
    python3      = 3.11.9      (required)
    numpy        = 1.26.4      (required)
    scipy        = 1.13.1      (optional)
    h5py         = 3.11.0      (optional)
    matplotlib   = (unknown)   (optional)
    pydot        = 1.4.2       (optional)
    spglib       = 2.0.2       (optional)
    seekpath     = 2.0.1       (optional)
    pycifrw      = (unknown)   (optional)
  
  Nexus dependencies recommended for full functionality:
    python3      = 3.6.0      (required)
    numpy        = 1.13.1     (required)
    scipy        = 0.19.1     (optional)
    h5py         = 2.7.1      (optional)
    matplotlib   = 2.0.2      (optional)
    pydot        = 1.2.3      (optional)
    spglib       = 1.9.9      (optional)
    seekpath     = 1.4.0      (optional)
    pycifrw      = 4.3.0      (optional)
    cif2cell     = 1.2.10     (optional)
  
  All required Nexus dependencies are met.
    Core workflow features should work.
    Some optional features may not.
    See below for more information.
  
  Some optional dependencies are missing or merit an update.
    These modules are not needed for core workflow operation.
    Optional features related to outdated modules may still work.
    Please install updated versions if problems are encountered.
  
  Optional dependencies that are missing:
    cif2cell   is missing.  Install 1.2.10 or greater.
  
  Optional dependencies benefitting from user check or update:
    matplotlib version is unknown.  Check for 2.0.2 or greater.
    pycifrw    version is unknown.  Check for 4.3.0 or greater.
  
 
Applying user settings 

  Pseudopotentials
    reading pp:  ../../pseudopotentials/C.BFD.upf 
    reading pp:  ../../pseudopotentials/C.BFD.xml 
    reading pp:  ../../pseudopotentials/H.BFD.upf 
    reading pp:  ../../pseudopotentials/H.BFD.xml 
    reading pp:  ../../pseudopotentials/O.BFD.upf 
    reading pp:  ../../pseudopotentials/O.BFD.xml 
 

Project starting 
  checking for file collisions 
  loading cascade images 
    cascade 0 checking in 
  checking cascade dependencies 
    all simulation dependencies satisfied 
  
  starting runs:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
  elapsed time 0.0 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 0 
      writing input files  0 scf
   Entering ./runs/diamond_ta/scf 0 
      sending required files  0 scf 
      submitting job  0 scf 
    Entering ./runs/diamond_ta/scf 0 
      Executing:  
        export OMP_NUM_THREADS=16
        python3 scf.py 

  elapsed time 3.0 s  memory 319.44 MB 
  elapsed time 6.1 s  memory 1451.97 MB 
  elapsed time 9.1 s  memory 1606.60 MB 
  elapsed time 12.1 s  memory 2024.05 MB 
  elapsed time 15.1 s  memory 1619.60 MB 
  elapsed time 18.1 s  memory 1745.74 MB 
  elapsed time 21.1 s  memory 1402.56 MB 
  elapsed time 24.2 s  memory 1141.36 MB 
  elapsed time 27.2 s  memory 2076.82 MB 
  elapsed time 30.2 s  memory 1839.52 MB 
  elapsed time 33.2 s  memory 1645.86 MB 
  elapsed time 36.2 s  memory 1993.04 MB 
(many lines deleted)
  elapsed time 1223.2 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 0 
      copying results  0 scf 
    Entering ./runs/diamond_ta/scf 0 
      analyzing  0 scf 

  elapsed time 1226.3 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 1 
      writing input files  1 c4q 
    Entering ./runs/diamond_ta/scf 1 
      sending required files  1 c4q 
      submitting job  1 c4q 
    Entering ./runs/diamond_ta/scf 1 
      Executing:  
        export OMP_NUM_THREADS=1
        mpirun -np 1 convert4qmc -prefix c4q -orbitals scf.h5 

  elapsed time 1229.3 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 1 
      copying results  1 c4q 
    Entering ./runs/diamond_ta/scf 1 
      analyzing  1 c4q 

  elapsed time 1232.3 s  memory 117.18 MB 
  elapsed time 1235.4 s  memory 117.18 MB 
  elapsed time 1238.4 s  memory 117.18 MB 
  elapsed time 1241.4 s  memory 117.18 MB 
  elapsed time 1244.4 s  memory 117.18 MB 
  elapsed time 1247.4 s  memory 117.18 MB 
  elapsed time 1250.4 s  memory 117.18 MB 
 (many lines deleted)
  elapsed time 60547.2 s  memory 117.18 MB 
  elapsed time 60550.2 s  memory 117.18 MB 
  elapsed time 60553.2 s  memory 117.18 MB 
  elapsed time 60556.2 s  memory 117.18 MB 
  elapsed time 60559.3 s  memory 117.18 MB 
$ pwd; ls -l
.. /qmcpack/nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/runs/diamond_ta/scf
total 240
-rw-r--r-- 1 pk7 users   1021 Jul  2 17:26 c4q.err
-rw-r--r-- 1 pk7 users     40 Jul  2 17:26 c4q.in
lrwxrwxrwx 1 pk7 users      6 Jul  2 17:26 c4q.orbs.h5 -> scf.h5
-rw-r--r-- 1 pk7 users     79 Jul  2 17:26 c4q.out
-rw-r--r-- 1 pk7 users   1252 Jul  2 17:25 scf.err
-rw-r--r-- 1 pk7 users 139630 Jul  2 17:25 scf.out
-rw-r--r-- 1 pk7 users   1885 Jul  2 17:05 scf.py
-rw-r--r-- 1 pk7 users    360 Jul  2 17:05 scf.struct.xsf
-rw-r--r-- 1 pk7 users    175 Jul  2 17:05 scf.struct.xyz
-rw-r--r-- 1 pk7 users  69792 Jul  2 17:25 scf.twistnum_000.h5
drwxr-xr-x 2 pk7 users     52 Jul  2 17:26 sim_c4q
drwxr-xr-x 2 pk7 users     52 Jul  2 17:26 sim_scf

c4q.err

Could not open H5 file
[nitrogen2:3440447] *** Process received signal ***
[nitrogen2:3440447] Signal: Aborted (6)
[nitrogen2:3440447] Signal code:  (-6)
[nitrogen2:3440447] [ 0] /lib64/libc.so.6(+0x3e6f0)[0x7f162b23e6f0]
[nitrogen2:3440447] [ 1] /lib64/libc.so.6(+0x8b94c)[0x7f162b28b94c]
[nitrogen2:3440447] [ 2] /lib64/libc.so.6(raise+0x16)[0x7f162b23e646]
[nitrogen2:3440447] [ 3] /lib64/libc.so.6(abort+0xd3)[0x7f162b2287f3]
[nitrogen2:3440447] [ 4] convert4qmc[0x47a2f3]
[nitrogen2:3440447] [ 5] convert4qmc[0x41ab59]
[nitrogen2:3440447] [ 6] /lib64/libc.so.6(+0x29590)[0x7f162b229590]
[nitrogen2:3440447] [ 7] /lib64/libc.so.6(__libc_start_main+0x80)[0x7f162b229640]
[nitrogen2:3440447] [ 8] convert4qmc[0x422db5]
[nitrogen2:3440447] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 3440447 on node nitrogen2 exited on
signal 6 (Aborted).
--------------------------------------------------------------------------

prckent avatar Jul 03 '24 14:07 prckent

For the nexus test failure, please add two print statements after this line to investigate:

1452:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/tests/unit/test_pyscf_input.py", line 577, in test_write
1452:     assert(text_eq(text,ref_text))

# add these
print(ref_text)
print(text)

For the examples added with this PR, diamond_pp_hf_twistavg_prim.py (primitive cell twist averaging) should run cleanly (at least it did for me -- please post the converter output), while diamond_pp_hf_twistavg.py (supercell twist averaging) currently fails at the converter level due to changes needed to Anouar's savetoqmcpack.

These features have bug-fixes needed at the QMCPACK/QMCPACK-converter levels (I've made Anouar aware already). This PR implements the Nexus-side features needed to drive these workflows, but does not guarantee that QMCPACK and its converters function properly.

jtkrogel avatar Jul 05 '24 14:07 jtkrogel

Thanks Jaron -- the situation is clear to me now. @anbenali How far off are the updates to savetoqmcpack? I put PySCF 2.6.2 in spack so can easily test the latest version. In would be nice to have patch so that Nexus support can be tested, but perhaps there are puzzles to solve?

prckent avatar Jul 06 '24 13:07 prckent

At least one difference is that one entry in newly generated sp_kpoints array is not quite zero:

sp_kpoints = array( [ 
2597:      [ -2.9375236054174356e-17, 0.0, 0.0 ]  ] )
2597:

Selected output from ctest -R ntest_nexus_pyscf_input

597: Test timeout computed to be: 3600
2597: 
2597:         #! /usr/bin/env python3
2597:         
2597:         from pyscf import scf
2597:         
2597:         
2597:         ### generated system text ###
2597:         from numpy import array
2597:         from pyscf.pbc import gto as gto_loc
2597:         cell = gto_loc.Cell()
2597:         cell.a             = '''
2597:                              1.78500000   1.78500000   0.00000000
2597:                              0.00000000   1.78500000   1.78500000
2597:                              1.78500000   0.00000000   1.78500000
2597:                              '''
2597:         cell.basis         = 'bfd-vdz'
2597:         cell.dimension     = 3
2597:         cell.ecp           = 'bfd'
2597:         cell.unit          = 'A'
2597:         cell.atom          = '''
2597:                              C    0.00000000   0.00000000   0.00000000
2597:                              C    0.89250000   0.89250000   0.89250000
2597:                              '''
2597:         cell.drop_exponent = 0.1
2597:         cell.verbose       = 5
2597:         cell.charge        = 0
2597:         cell.spin          = 0
2597:         cell.build()
2597:         kpts = array( [ 
2597:              [ 0.0, 0.0, 0.0 ]  ,
2597:              [ 0.4656748546088228, 0.4656748546088228, -0.4656748546088228 ]  ] )
2597:         ### end generated system text ###
2597:         
2597:         
2597:         
2597:         mf = scf.RHF(mol)
2597:         mf.kernel()
2597:         
2597:         ### generated conversion text ###
2597:         from PyscfToQmcpack import savetoqmcpack
2597:         tiling =  [ 2,1,1 ] 
2597:         sp_kpoints = array( [ 
2597:              [ 0.0, 0.0, 0.0 ]  ] )
2597:         sp_kmap    = array( [ 
2597:              [ 0, 1 ]  ] )
2597:         for n,kp in enumerate(sp_kpoints):
2597:             savetoqmcpack(cell,mf,'scf.twistnum_{}'.format(str(n).zfill(3)),kmesh=tiling,kpts=kpts [ sp_kmap [ n ]  ] ,sp_twist=kp)
2597:         #end for
2597:         ### end generated conversion text ###
2597:         
2597: #! /usr/bin/env python3
2597: 
2597: from pyscf import scf
2597: 
2597: 
2597: ### generated system text ###
2597: from numpy import array
2597: from pyscf.pbc import gto as gto_loc
2597: cell = gto_loc.Cell()
2597: cell.a             = '''
2597:                      1.78500000   1.78500000   0.00000000
2597:                      0.00000000   1.78500000   1.78500000
2597:                      1.78500000   0.00000000   1.78500000
2597:                      '''
2597: cell.basis         = 'bfd-vdz'
2597: cell.dimension     = 3
2597: cell.ecp           = 'bfd'
2597: cell.unit          = 'A'
2597: cell.atom          = '''
2597:                      C    0.00000000   0.00000000   0.00000000
2597:                      C    0.89250000   0.89250000   0.89250000
2597:                      '''
2597: cell.drop_exponent = 0.1
2597: cell.verbose       = 5
2597: cell.charge        = 0
2597: cell.spin          = 0
2597: cell.build()
2597: kpts = array( [ 
2597:      [ 0.0, 0.0, 0.0 ] ,
2597:      [ 0.4656748546088228, 0.4656748546088228, -0.4656748546088228 ]  ] )
2597: ### end generated system text ###
2597: 
2597: 
2597: 
2597: mf = scf.RHF(mol)
2597: mf.kernel()
2597: 
2597: ### generated conversion text ###
2597: from PyscfToQmcpack import savetoqmcpack
2597: tiling =  [ 2,1,1 ] 
2597: sp_kpoints = array( [ 
2597:      [ -2.9375236054174356e-17, 0.0, 0.0 ]  ] )
2597: sp_kmap    = array( [ 
2597:      [ 0, 1 ]  ] )
2597: for n,kp in enumerate(sp_kpoints):
2597:     savetoqmcpack(cell,mf,'scf.twistnum_{}'.format(str(n).zfill(3)),kmesh=tiling,kpts=kpts [ sp_kmap [ n ]  ] ,sp_twist=kp)
2597: #end for
2597: ### end generated conversion text ###
2597: 
2597: 
2597: Test name     : pyscf_input
2597: Test sublabel : test_write
2597: Test exception: "AssertionError: "
2597: Test backtrace:
2597:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 478, in run
2597:     self.operation()
2597:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 1210, in pyscf_input
2597:     nunit('write')
2597:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 349, in nunit
2597:     run_external_unit_test(test_name,unit_test)
2597:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
2597:     unit_test()
2597:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/tests/unit/test_pyscf_input.py", line 580, in test_write
2597:     assert(text_eq(text,ref_text))
2597:            ^^^^^^^^^^^^^^^^^^^^^^
2597: 
2597: Test status: fail
1/1 Test #2597: ntest_nexus_pyscf_input ..........***Failed    0.63 sec

0% tests passed, 1 tests failed out of 1

via

diff --git a/nexus/tests/unit/test_pyscf_input.py b/nexus/tests/unit/test_pyscf_input.py
index a966746cc..951435e01 100644
--- a/nexus/tests/unit/test_pyscf_input.py
+++ b/nexus/tests/unit/test_pyscf_input.py
@@ -574,6 +574,9 @@ def test_write():
     text = text.replace('[',' [ ').replace(']',' ] ')
     ref_text = ref_text.replace('[',' [ ').replace(']',' ] ')
 
+# add these
+    print(ref_text)
+    print(text)
     assert(text_eq(text,ref_text))
 
 #end def test_write

Note that this is failing in GitHub actions CI as well -- the centos builds show the same issue. Presumably either something is not zeroed out or rounding is slightly different and a numerical tolerance is needed.

prckent avatar Jul 09 '24 14:07 prckent

Everything is working on my end. I fixed the files that did not work. Please check if I forgot something.

anbenali avatar Jul 11 '24 07:07 anbenali

Test this please

prckent avatar Jul 11 '24 11:07 prckent

^^^ To get some more variety in the test configurations run. Hopefully only the floating point rounding/tolerance issue needs to be tackled.

Thanks Anouar. Great to have this path supported.

prckent avatar Jul 11 '24 11:07 prckent

While driving. I realized that the QMC input requires explicitly the twist coordinate to compute the phase to apply to the orbital. In the current qmc input from nexus we get a twist number. I am not sure if that is accessed in the H5 file or of of it os even stored. I can look at it in the afternoon.

On Thu, Jul 11, 2024, 8:21 AM Paul R. C. Kent @.***> wrote:

@.**** requested changes on this pull request.

(Noting for ease of seeing the status on the PR summary)

The k-points tolerance issue should be addressed - besides CentOS failures, presumably this is what is causing the failures on sulfur (RHEL9), since the same test is failing.

Definitely not needed for this PR, but from the point of view of reading CI failures, it would be very helpful to replace all instances of assert(text_eq(a,b)) with a custom function that gave actionable info by printing the failure and the a,b before aborting.

— Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/pull/5073#pullrequestreview-2171887281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4XVQ2HFRLRQRMTTDU4BSLZL2BHLAVCNFSM6AAAAABKGISPDWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZRHA4DOMRYGE . You are receiving this because you were mentioned.Message ID: @.***>

anbenali avatar Jul 11 '24 14:07 anbenali

@anbenali There is an indentation bug (or more) in PyscfToQmcpack.py. I got the following running the diamond prim example:

Traceback (most recent call last):
  File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/runs/diamond_ta_prim/scf/scf.py", line 45, in <module>
    from PyscfToQmcpack import savetoqmcpack
  File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/src/QMCTools/PyscfToQmcpack.py", line 553
    E_g, C_gamma,E_g_unsorted,C_unsorted = mo_k2gamma(cell, e_k, mo_k, kpts,kmesh)
                                                                                  ^
IndentationError: unindent does not match any outer indentation level

prckent avatar Jul 11 '24 17:07 prckent

Interestingly what happens in that example is that on my system the pyscf output step fails as above, then the converter step runs and also fails, since the HDF5 files is not present. Somehow though, Nexus does catch the failures and the converter step runs forever. Maybe something in the workstation class (?)

prckent avatar Jul 11 '24 17:07 prckent

@jtkrogel: there are still errors to be fixed for the kpts case. At this point you can try changing "drop_exponent = 0.3" so it would run faster for the tests.

Then if you have 2 twists generated, we pass the following "title='scf.twistnum_{}'.format(str(n).zfill(3))' but then we do a static link to scf.h5 for c4q.orbs.h5. scf.h5 does not exist in this case.

But then it sends also mpirun -np 1 convert4qmc -prefix c4q -orbitals scf.h5

Obviously scf.h5 does not exist as well.

After fixing these by hand to push the simulation, unfortunately it still fails at the vmc with: for n,(h5file,kp) in enumerate(zip(result.orb_files,result.kpoints)): ^^^^^^^^^^^^^^^^ AttributeError: 'obj' object has no attribute 'orb_files'

I Think it would be faster for you to track the error.

anbenali avatar Jul 12 '24 04:07 anbenali

After discussion with Anoaur, I've confirmed that Nexus is now fully functional in this PR. Anouar plans to push another fix to PyscfToQmcpack.py.

I will fix the unit test failure issue (which doesn't reproduce on my laptop) in a week or so after this PR goes in and I am back from vacation. This will require some updates to the Nexus testing system itself, as requested by Paul.

jtkrogel avatar Jul 12 '24 17:07 jtkrogel

If anyone wants to look at the test failure in the interim, it's likely that this will fix the issue (adds a 1e-8 absolute tolerance in addition to the default 1e-6 relative tolerance):

assert(text_eq(text,ref_text,atol=1e-8))

jtkrogel avatar Jul 12 '24 17:07 jtkrogel

Didn't know about that tolerance option. Will try it. If updating the tolerance that way works, we can probably merge this PR. The default relative tolerance is likely unreliable with zero as one of the values.

We can't merge something that is known to break a deterministic test (recommended installation test 'ctest -L deterministic') and the CI without putting in a bypass (expected failure). This in turn would forgive other errors, so while PRs don't have to have a complete feature (and probably shouldn't if a large feature), they do need to pass in CI somehow.

prckent avatar Jul 12 '24 17:07 prckent

I remember some discussion long time ago that twistnum is error-prone and explicit twist coordinates are preferred. Can nexus move away from twistnum?

ye-luo avatar Jul 12 '24 17:07 ye-luo

Test this please

prckent avatar Jul 12 '24 17:07 prckent