qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

[WIP] rmg converter fixes

Open kgasperich opened this issue 3 years ago • 2 comments

Proposed changes

fixes failing RMG converter tests

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • Bugfix

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

cooley

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.

  • Yes. This PR is up to date with current the current state of 'develop'
  • No. Code added or changed in the PR has been clang-formatted
  • No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • No. Documentation has been added (if appropriate)

kgasperich avatar Aug 10 '22 17:08 kgasperich

Run locally I could get these tests to work. However I noticed that likely the dependencies between rmg tests are not correct. At least ctest -j 1 worked for me but ctest -j 128 failed.

I also noticed that RMG grabbed an unexpected number of threads:

      Start  338: rmg-Diamond2-1x1x1-gamma-ccECP-np-1-scf

338: Test command: /home/pk7/apps/spack/opt/spack/linux-rhel8-zen2/gcc-12.1.0/rmgdft-4.3.1-uzqyrhjzdxyb63mrurrmpcgv4ib7gzr6/bin/rmg-cpu "input"
338: Environment variables: 
338:  OMP_NUM_THREADS=12
338:  RMG_NUM_THREADS=12
338: Test timeout computed to be: 3600
338: RMG running with 1 MPI procs per host.
338: C1: Numa aware allocation with 1 MPI proc, 128 cores and 2 numa nodes per host.
338: Running with 12 Open MP threads.
338: Running with 12 RMG threads.
338: 
338:                * * * * * * * * * *
338:                *    R   M   G    *
338:                * * * * * * * * * *
338: 
338:  -- A Real Space Multigrid Electronic structure code --
338:  --      More information at www.rmgdft.org          --
338: 
338: 
338: 
338: NOTICE: RMG internal pseudopotentials have switched to
338: ONCVP from Ultrasoft. You can revert to Ultrasoft by
338: adding the input tag internal_pseudo_type="ultrasoft" to
338: your input files.
338: 
338: 
338:  quench: [md:   0/100  scf:   0/100  step time:   0.05  scf time:     0.13 secs  RMS[dV]: 5.07e-02 ]
338:  quench: [md:   0/100  scf:   1/100  step time:   0.04  scf time:     0.17 secs  RMS[dV]: 6.92e-02 ]
338:  quench: [md:   0/100  scf:   2/100  step time:   0.04  scf time:     0.22 secs  RMS[dV]: 3.96e-02 ]
338:  quench: [md:   0/100  scf:   3/100  step time:   0.04  scf time:     0.25 secs  RMS[dV]: 2.69e-02 ]
338:  quench: [md:   0/100  scf:   4/100  step time:   0.02  scf time:     0.27 secs  RMS[dV]: 8.69e-03 ]
338:  quench: [md:   0/100  scf:   5/100  step time:   0.02  scf time:     0.29 secs  RMS[dV]: 1.30e-02 ]
338:  quench: [md:   0/100  scf:   6/100  step time:   0.02  scf time:     0.32 secs  RMS[dV]: 6.06e-03 ]
338:  quench: [md:   0/100  scf:   7/100  step time:   0.02  scf time:     0.34 secs  RMS[dV]: 9.89e-04 ]
338:  quench: [md:   0/100  scf:   8/100  step time:   0.02  scf time:     0.36 secs  RMS[dV]: 1.93e-03 ]
338:  quench: [md:   0/100  scf:   9/100  step time:   0.02  scf time:     0.39 secs  RMS[dV]: 2.55e-03 ]
338:  quench: [md:   0/100  scf:  10/100  step time:   0.03  scf time:     0.43 secs  RMS[dV]: 2.83e-04 ]
338:  Convergence criterion reached: Energy variation (1.98e-10) is lower than threshold (1.00e-09)
 3/15 Test  #338: rmg-Diamond2-1x1x1-gamma-ccECP-np-1-scf ................   Passed    7.26 sec

prckent avatar Aug 11 '22 02:08 prckent

I can't reproduce the issues with the other tests (the ones using the python script to call convert4qmc and compare output xml), so I'd consider this ready for review and we can address any remaining issues in a later PR.

kgasperich avatar Aug 12 '22 15:08 kgasperich

Test this please

ye-luo avatar Aug 24 '22 02:08 ye-luo

@ye-luo Were you able to run this locally in parallel and have everything pass?

prckent avatar Aug 24 '22 10:08 prckent

Tested locally -j 128.

prckent avatar Aug 24 '22 10:08 prckent

Tested locally -j 128.

Thanks. I don't have RMG installed but I was able to fake an rmg and run all the tests with -j32 to see how concurrent tests behave.

ye-luo avatar Aug 24 '22 13:08 ye-luo