Nexus: support supercell twists in PySCF workflows
Proposed changes
This PR adds support for arbitrary supercell twists/twist grids in workflows involving PySCF and QMCPACK.
This PR is now ready for review.
What type(s) of changes does this code introduce?
- New feature
Does this introduce a breaking change?
- No
What systems has this change been tested on?
Laptop, Improv at ALCF
Checklist
- Yes. This PR is up to date with current the current state of 'develop'
Path out of WIP
- [x] Supercell twist to primitive cell k-point mappings in generated PySCF inputs
- [x] Loop-free twist averaging in subsequent QMCPACK runs
- [x] Add a twist-averaging VMC examples for diamond
- [x] Fix pyscf tests
Ready to proceed.
could you suggest a reviewer?
Yes, already flagged @anbenali for this.
Yes, already flagged @anbenali for this.
Oh thanks. I missed it.
c4q does not complete successfully for me in both example cases. Additionally, Nexus doesn't notice the error and keeps checking status, this time overnight. (I have seen this on other occasions, so my guess is that there is some changed handling or error messages or signals that are not being caught by the workstation/"wsNN" infrastructure, perhaps with openmpi runs). These were run on nitrogen2 with the nightly test configuration for gcc "new"+openmpi. i.e. reasonably new versions of all software including python (3.11.9) installed via spack. Note the broken scf.h5 link. PySCF is 2.5.0 in this case. Happy to poke further -- this could well be completely unrelated to Nexus and something to do with the converter or a PySCF version dependency etc.
nohup: ignoring input
_____________________________________________________
Nexus 2.1.0
(c) Copyright 2012- Nexus developers
Please cite:
J. T. Krogel Comput. Phys. Commun. 198 154 (2016)
https://doi.org/10.1016/j.cpc.2015.08.012
_____________________________________________________
Checking for Nexus dependencies on the current machine...
Nexus dependencies available on current machine:
python3 = 3.11.9 (required)
numpy = 1.26.4 (required)
scipy = 1.13.1 (optional)
h5py = 3.11.0 (optional)
matplotlib = (unknown) (optional)
pydot = 1.4.2 (optional)
spglib = 2.0.2 (optional)
seekpath = 2.0.1 (optional)
pycifrw = (unknown) (optional)
Nexus dependencies recommended for full functionality:
python3 = 3.6.0 (required)
numpy = 1.13.1 (required)
scipy = 0.19.1 (optional)
h5py = 2.7.1 (optional)
matplotlib = 2.0.2 (optional)
pydot = 1.2.3 (optional)
spglib = 1.9.9 (optional)
seekpath = 1.4.0 (optional)
pycifrw = 4.3.0 (optional)
cif2cell = 1.2.10 (optional)
All required Nexus dependencies are met.
Core workflow features should work.
Some optional features may not.
See below for more information.
Some optional dependencies are missing or merit an update.
These modules are not needed for core workflow operation.
Optional features related to outdated modules may still work.
Please install updated versions if problems are encountered.
Optional dependencies that are missing:
cif2cell is missing. Install 1.2.10 or greater.
Optional dependencies benefitting from user check or update:
matplotlib version is unknown. Check for 2.0.2 or greater.
pycifrw version is unknown. Check for 4.3.0 or greater.
Applying user settings
Pseudopotentials
reading pp: ../../pseudopotentials/C.BFD.upf
reading pp: ../../pseudopotentials/C.BFD.xml
reading pp: ../../pseudopotentials/H.BFD.upf
reading pp: ../../pseudopotentials/H.BFD.xml
reading pp: ../../pseudopotentials/O.BFD.upf
reading pp: ../../pseudopotentials/O.BFD.xml
Project starting
checking for file collisions
loading cascade images
cascade 0 checking in
checking cascade dependencies
all simulation dependencies satisfied
starting runs:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
elapsed time 0.0 s memory 117.18 MB
Entering ./runs/diamond_ta/scf 0
writing input files 0 scf
Entering ./runs/diamond_ta/scf 0
sending required files 0 scf
submitting job 0 scf
Entering ./runs/diamond_ta/scf 0
Executing:
export OMP_NUM_THREADS=16
python3 scf.py
elapsed time 3.0 s memory 319.44 MB
elapsed time 6.1 s memory 1451.97 MB
elapsed time 9.1 s memory 1606.60 MB
elapsed time 12.1 s memory 2024.05 MB
elapsed time 15.1 s memory 1619.60 MB
elapsed time 18.1 s memory 1745.74 MB
elapsed time 21.1 s memory 1402.56 MB
elapsed time 24.2 s memory 1141.36 MB
elapsed time 27.2 s memory 2076.82 MB
elapsed time 30.2 s memory 1839.52 MB
elapsed time 33.2 s memory 1645.86 MB
elapsed time 36.2 s memory 1993.04 MB
(many lines deleted)
elapsed time 1223.2 s memory 117.18 MB
Entering ./runs/diamond_ta/scf 0
copying results 0 scf
Entering ./runs/diamond_ta/scf 0
analyzing 0 scf
elapsed time 1226.3 s memory 117.18 MB
Entering ./runs/diamond_ta/scf 1
writing input files 1 c4q
Entering ./runs/diamond_ta/scf 1
sending required files 1 c4q
submitting job 1 c4q
Entering ./runs/diamond_ta/scf 1
Executing:
export OMP_NUM_THREADS=1
mpirun -np 1 convert4qmc -prefix c4q -orbitals scf.h5
elapsed time 1229.3 s memory 117.18 MB
Entering ./runs/diamond_ta/scf 1
copying results 1 c4q
Entering ./runs/diamond_ta/scf 1
analyzing 1 c4q
elapsed time 1232.3 s memory 117.18 MB
elapsed time 1235.4 s memory 117.18 MB
elapsed time 1238.4 s memory 117.18 MB
elapsed time 1241.4 s memory 117.18 MB
elapsed time 1244.4 s memory 117.18 MB
elapsed time 1247.4 s memory 117.18 MB
elapsed time 1250.4 s memory 117.18 MB
(many lines deleted)
elapsed time 60547.2 s memory 117.18 MB
elapsed time 60550.2 s memory 117.18 MB
elapsed time 60553.2 s memory 117.18 MB
elapsed time 60556.2 s memory 117.18 MB
elapsed time 60559.3 s memory 117.18 MB
$ pwd; ls -l
.. /qmcpack/nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/runs/diamond_ta/scf
total 240
-rw-r--r-- 1 pk7 users 1021 Jul 2 17:26 c4q.err
-rw-r--r-- 1 pk7 users 40 Jul 2 17:26 c4q.in
lrwxrwxrwx 1 pk7 users 6 Jul 2 17:26 c4q.orbs.h5 -> scf.h5
-rw-r--r-- 1 pk7 users 79 Jul 2 17:26 c4q.out
-rw-r--r-- 1 pk7 users 1252 Jul 2 17:25 scf.err
-rw-r--r-- 1 pk7 users 139630 Jul 2 17:25 scf.out
-rw-r--r-- 1 pk7 users 1885 Jul 2 17:05 scf.py
-rw-r--r-- 1 pk7 users 360 Jul 2 17:05 scf.struct.xsf
-rw-r--r-- 1 pk7 users 175 Jul 2 17:05 scf.struct.xyz
-rw-r--r-- 1 pk7 users 69792 Jul 2 17:25 scf.twistnum_000.h5
drwxr-xr-x 2 pk7 users 52 Jul 2 17:26 sim_c4q
drwxr-xr-x 2 pk7 users 52 Jul 2 17:26 sim_scf
c4q.err
Could not open H5 file
[nitrogen2:3440447] *** Process received signal ***
[nitrogen2:3440447] Signal: Aborted (6)
[nitrogen2:3440447] Signal code: (-6)
[nitrogen2:3440447] [ 0] /lib64/libc.so.6(+0x3e6f0)[0x7f162b23e6f0]
[nitrogen2:3440447] [ 1] /lib64/libc.so.6(+0x8b94c)[0x7f162b28b94c]
[nitrogen2:3440447] [ 2] /lib64/libc.so.6(raise+0x16)[0x7f162b23e646]
[nitrogen2:3440447] [ 3] /lib64/libc.so.6(abort+0xd3)[0x7f162b2287f3]
[nitrogen2:3440447] [ 4] convert4qmc[0x47a2f3]
[nitrogen2:3440447] [ 5] convert4qmc[0x41ab59]
[nitrogen2:3440447] [ 6] /lib64/libc.so.6(+0x29590)[0x7f162b229590]
[nitrogen2:3440447] [ 7] /lib64/libc.so.6(__libc_start_main+0x80)[0x7f162b229640]
[nitrogen2:3440447] [ 8] convert4qmc[0x422db5]
[nitrogen2:3440447] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 3440447 on node nitrogen2 exited on
signal 6 (Aborted).
--------------------------------------------------------------------------
For the nexus test failure, please add two print statements after this line to investigate:
1452: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/tests/unit/test_pyscf_input.py", line 577, in test_write
1452: assert(text_eq(text,ref_text))
# add these
print(ref_text)
print(text)
For the examples added with this PR, diamond_pp_hf_twistavg_prim.py (primitive cell twist averaging) should run cleanly (at least it did for me -- please post the converter output), while diamond_pp_hf_twistavg.py (supercell twist averaging) currently fails at the converter level due to changes needed to Anouar's savetoqmcpack.
These features have bug-fixes needed at the QMCPACK/QMCPACK-converter levels (I've made Anouar aware already). This PR implements the Nexus-side features needed to drive these workflows, but does not guarantee that QMCPACK and its converters function properly.
Thanks Jaron -- the situation is clear to me now. @anbenali How far off are the updates to savetoqmcpack? I put PySCF 2.6.2 in spack so can easily test the latest version. In would be nice to have patch so that Nexus support can be tested, but perhaps there are puzzles to solve?
At least one difference is that one entry in newly generated sp_kpoints array is not quite zero:
sp_kpoints = array( [
2597: [ -2.9375236054174356e-17, 0.0, 0.0 ] ] )
2597:
Selected output from ctest -R ntest_nexus_pyscf_input
597: Test timeout computed to be: 3600
2597:
2597: #! /usr/bin/env python3
2597:
2597: from pyscf import scf
2597:
2597:
2597: ### generated system text ###
2597: from numpy import array
2597: from pyscf.pbc import gto as gto_loc
2597: cell = gto_loc.Cell()
2597: cell.a = '''
2597: 1.78500000 1.78500000 0.00000000
2597: 0.00000000 1.78500000 1.78500000
2597: 1.78500000 0.00000000 1.78500000
2597: '''
2597: cell.basis = 'bfd-vdz'
2597: cell.dimension = 3
2597: cell.ecp = 'bfd'
2597: cell.unit = 'A'
2597: cell.atom = '''
2597: C 0.00000000 0.00000000 0.00000000
2597: C 0.89250000 0.89250000 0.89250000
2597: '''
2597: cell.drop_exponent = 0.1
2597: cell.verbose = 5
2597: cell.charge = 0
2597: cell.spin = 0
2597: cell.build()
2597: kpts = array( [
2597: [ 0.0, 0.0, 0.0 ] ,
2597: [ 0.4656748546088228, 0.4656748546088228, -0.4656748546088228 ] ] )
2597: ### end generated system text ###
2597:
2597:
2597:
2597: mf = scf.RHF(mol)
2597: mf.kernel()
2597:
2597: ### generated conversion text ###
2597: from PyscfToQmcpack import savetoqmcpack
2597: tiling = [ 2,1,1 ]
2597: sp_kpoints = array( [
2597: [ 0.0, 0.0, 0.0 ] ] )
2597: sp_kmap = array( [
2597: [ 0, 1 ] ] )
2597: for n,kp in enumerate(sp_kpoints):
2597: savetoqmcpack(cell,mf,'scf.twistnum_{}'.format(str(n).zfill(3)),kmesh=tiling,kpts=kpts [ sp_kmap [ n ] ] ,sp_twist=kp)
2597: #end for
2597: ### end generated conversion text ###
2597:
2597: #! /usr/bin/env python3
2597:
2597: from pyscf import scf
2597:
2597:
2597: ### generated system text ###
2597: from numpy import array
2597: from pyscf.pbc import gto as gto_loc
2597: cell = gto_loc.Cell()
2597: cell.a = '''
2597: 1.78500000 1.78500000 0.00000000
2597: 0.00000000 1.78500000 1.78500000
2597: 1.78500000 0.00000000 1.78500000
2597: '''
2597: cell.basis = 'bfd-vdz'
2597: cell.dimension = 3
2597: cell.ecp = 'bfd'
2597: cell.unit = 'A'
2597: cell.atom = '''
2597: C 0.00000000 0.00000000 0.00000000
2597: C 0.89250000 0.89250000 0.89250000
2597: '''
2597: cell.drop_exponent = 0.1
2597: cell.verbose = 5
2597: cell.charge = 0
2597: cell.spin = 0
2597: cell.build()
2597: kpts = array( [
2597: [ 0.0, 0.0, 0.0 ] ,
2597: [ 0.4656748546088228, 0.4656748546088228, -0.4656748546088228 ] ] )
2597: ### end generated system text ###
2597:
2597:
2597:
2597: mf = scf.RHF(mol)
2597: mf.kernel()
2597:
2597: ### generated conversion text ###
2597: from PyscfToQmcpack import savetoqmcpack
2597: tiling = [ 2,1,1 ]
2597: sp_kpoints = array( [
2597: [ -2.9375236054174356e-17, 0.0, 0.0 ] ] )
2597: sp_kmap = array( [
2597: [ 0, 1 ] ] )
2597: for n,kp in enumerate(sp_kpoints):
2597: savetoqmcpack(cell,mf,'scf.twistnum_{}'.format(str(n).zfill(3)),kmesh=tiling,kpts=kpts [ sp_kmap [ n ] ] ,sp_twist=kp)
2597: #end for
2597: ### end generated conversion text ###
2597:
2597:
2597: Test name : pyscf_input
2597: Test sublabel : test_write
2597: Test exception: "AssertionError: "
2597: Test backtrace:
2597: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 478, in run
2597: self.operation()
2597: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 1210, in pyscf_input
2597: nunit('write')
2597: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 349, in nunit
2597: run_external_unit_test(test_name,unit_test)
2597: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
2597: unit_test()
2597: File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/tests/unit/test_pyscf_input.py", line 580, in test_write
2597: assert(text_eq(text,ref_text))
2597: ^^^^^^^^^^^^^^^^^^^^^^
2597:
2597: Test status: fail
1/1 Test #2597: ntest_nexus_pyscf_input ..........***Failed 0.63 sec
0% tests passed, 1 tests failed out of 1
via
diff --git a/nexus/tests/unit/test_pyscf_input.py b/nexus/tests/unit/test_pyscf_input.py
index a966746cc..951435e01 100644
--- a/nexus/tests/unit/test_pyscf_input.py
+++ b/nexus/tests/unit/test_pyscf_input.py
@@ -574,6 +574,9 @@ def test_write():
text = text.replace('[',' [ ').replace(']',' ] ')
ref_text = ref_text.replace('[',' [ ').replace(']',' ] ')
+# add these
+ print(ref_text)
+ print(text)
assert(text_eq(text,ref_text))
#end def test_write
Note that this is failing in GitHub actions CI as well -- the centos builds show the same issue. Presumably either something is not zeroed out or rounding is slightly different and a numerical tolerance is needed.
Everything is working on my end. I fixed the files that did not work. Please check if I forgot something.
Test this please
^^^ To get some more variety in the test configurations run. Hopefully only the floating point rounding/tolerance issue needs to be tackled.
Thanks Anouar. Great to have this path supported.
While driving. I realized that the QMC input requires explicitly the twist coordinate to compute the phase to apply to the orbital. In the current qmc input from nexus we get a twist number. I am not sure if that is accessed in the H5 file or of of it os even stored. I can look at it in the afternoon.
On Thu, Jul 11, 2024, 8:21 AM Paul R. C. Kent @.***> wrote:
@.**** requested changes on this pull request.
(Noting for ease of seeing the status on the PR summary)
The k-points tolerance issue should be addressed - besides CentOS failures, presumably this is what is causing the failures on sulfur (RHEL9), since the same test is failing.
Definitely not needed for this PR, but from the point of view of reading CI failures, it would be very helpful to replace all instances of assert(text_eq(a,b)) with a custom function that gave actionable info by printing the failure and the a,b before aborting.
— Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/pull/5073#pullrequestreview-2171887281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4XVQ2HFRLRQRMTTDU4BSLZL2BHLAVCNFSM6AAAAABKGISPDWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZRHA4DOMRYGE . You are receiving this because you were mentioned.Message ID: @.***>
@anbenali There is an indentation bug (or more) in PyscfToQmcpack.py. I got the following running the diamond prim example:
Traceback (most recent call last):
File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/runs/diamond_ta_prim/scf/scf.py", line 45, in <module>
from PyscfToQmcpack import savetoqmcpack
File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/src/QMCTools/PyscfToQmcpack.py", line 553
E_g, C_gamma,E_g_unsorted,C_unsorted = mo_k2gamma(cell, e_k, mo_k, kpts,kmesh)
^
IndentationError: unindent does not match any outer indentation level
Interestingly what happens in that example is that on my system the pyscf output step fails as above, then the converter step runs and also fails, since the HDF5 files is not present. Somehow though, Nexus does catch the failures and the converter step runs forever. Maybe something in the workstation class (?)
@jtkrogel: there are still errors to be fixed for the kpts case. At this point you can try changing "drop_exponent = 0.3" so it would run faster for the tests.
Then if you have 2 twists generated, we pass the following "title='scf.twistnum_{}'.format(str(n).zfill(3))' but then we do a static link to scf.h5 for c4q.orbs.h5. scf.h5 does not exist in this case.
But then it sends also mpirun -np 1 convert4qmc -prefix c4q -orbitals scf.h5
Obviously scf.h5 does not exist as well.
After fixing these by hand to push the simulation, unfortunately it still fails at the vmc with: for n,(h5file,kp) in enumerate(zip(result.orb_files,result.kpoints)): ^^^^^^^^^^^^^^^^ AttributeError: 'obj' object has no attribute 'orb_files'
I Think it would be faster for you to track the error.
After discussion with Anoaur, I've confirmed that Nexus is now fully functional in this PR. Anouar plans to push another fix to PyscfToQmcpack.py.
I will fix the unit test failure issue (which doesn't reproduce on my laptop) in a week or so after this PR goes in and I am back from vacation. This will require some updates to the Nexus testing system itself, as requested by Paul.
If anyone wants to look at the test failure in the interim, it's likely that this will fix the issue (adds a 1e-8 absolute tolerance in addition to the default 1e-6 relative tolerance):
assert(text_eq(text,ref_text,atol=1e-8))
Didn't know about that tolerance option. Will try it. If updating the tolerance that way works, we can probably merge this PR. The default relative tolerance is likely unreliable with zero as one of the values.
We can't merge something that is known to break a deterministic test (recommended installation test 'ctest -L deterministic') and the CI without putting in a bypass (expected failure). This in turn would forgive other errors, so while PRs don't have to have a complete feature (and probably shouldn't if a large feature), they do need to pass in CI somehow.
I remember some discussion long time ago that twistnum is error-prone and explicit twist coordinates are preferred. Can nexus move away from twistnum?
Test this please