[WIP] rmg converter fixes
Proposed changes
fixes failing RMG converter tests
What type(s) of changes does this code introduce?
Delete the items that do not apply
- Bugfix
Does this introduce a breaking change?
- No
What systems has this change been tested on?
cooley
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.
- Yes. This PR is up to date with current the current state of 'develop'
- No. Code added or changed in the PR has been clang-formatted
- No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
- No. Documentation has been added (if appropriate)
Run locally I could get these tests to work. However I noticed that likely the dependencies between rmg tests are not correct. At least ctest -j 1 worked for me but ctest -j 128 failed.
I also noticed that RMG grabbed an unexpected number of threads:
Start 338: rmg-Diamond2-1x1x1-gamma-ccECP-np-1-scf
338: Test command: /home/pk7/apps/spack/opt/spack/linux-rhel8-zen2/gcc-12.1.0/rmgdft-4.3.1-uzqyrhjzdxyb63mrurrmpcgv4ib7gzr6/bin/rmg-cpu "input"
338: Environment variables:
338: OMP_NUM_THREADS=12
338: RMG_NUM_THREADS=12
338: Test timeout computed to be: 3600
338: RMG running with 1 MPI procs per host.
338: C1: Numa aware allocation with 1 MPI proc, 128 cores and 2 numa nodes per host.
338: Running with 12 Open MP threads.
338: Running with 12 RMG threads.
338:
338: * * * * * * * * * *
338: * R M G *
338: * * * * * * * * * *
338:
338: -- A Real Space Multigrid Electronic structure code --
338: -- More information at www.rmgdft.org --
338:
338:
338:
338: NOTICE: RMG internal pseudopotentials have switched to
338: ONCVP from Ultrasoft. You can revert to Ultrasoft by
338: adding the input tag internal_pseudo_type="ultrasoft" to
338: your input files.
338:
338:
338: quench: [md: 0/100 scf: 0/100 step time: 0.05 scf time: 0.13 secs RMS[dV]: 5.07e-02 ]
338: quench: [md: 0/100 scf: 1/100 step time: 0.04 scf time: 0.17 secs RMS[dV]: 6.92e-02 ]
338: quench: [md: 0/100 scf: 2/100 step time: 0.04 scf time: 0.22 secs RMS[dV]: 3.96e-02 ]
338: quench: [md: 0/100 scf: 3/100 step time: 0.04 scf time: 0.25 secs RMS[dV]: 2.69e-02 ]
338: quench: [md: 0/100 scf: 4/100 step time: 0.02 scf time: 0.27 secs RMS[dV]: 8.69e-03 ]
338: quench: [md: 0/100 scf: 5/100 step time: 0.02 scf time: 0.29 secs RMS[dV]: 1.30e-02 ]
338: quench: [md: 0/100 scf: 6/100 step time: 0.02 scf time: 0.32 secs RMS[dV]: 6.06e-03 ]
338: quench: [md: 0/100 scf: 7/100 step time: 0.02 scf time: 0.34 secs RMS[dV]: 9.89e-04 ]
338: quench: [md: 0/100 scf: 8/100 step time: 0.02 scf time: 0.36 secs RMS[dV]: 1.93e-03 ]
338: quench: [md: 0/100 scf: 9/100 step time: 0.02 scf time: 0.39 secs RMS[dV]: 2.55e-03 ]
338: quench: [md: 0/100 scf: 10/100 step time: 0.03 scf time: 0.43 secs RMS[dV]: 2.83e-04 ]
338: Convergence criterion reached: Energy variation (1.98e-10) is lower than threshold (1.00e-09)
3/15 Test #338: rmg-Diamond2-1x1x1-gamma-ccECP-np-1-scf ................ Passed 7.26 sec
I can't reproduce the issues with the other tests (the ones using the python script to call convert4qmc and compare output xml), so I'd consider this ready for review and we can address any remaining issues in a later PR.
Test this please
@ye-luo Were you able to run this locally in parallel and have everything pass?
Tested locally -j 128.
Tested locally -j 128.
Thanks. I don't have RMG installed but I was able to fake an rmg and run all the tests with -j32 to see how concurrent tests behave.