qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access

Open prckent opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe.

To efficiently run on multi GPU nodes we need to control access at a per GPU level. RESOURCE_GROUPS would allow multiple tests using one GPU each to run simulatanously. Currently we have a single lock and only 1 GPU is used. RESOURCE_GROUPS would also accommodate future multiple GPU tests. https://cmake.org/cmake/help/latest/prop_test/RESOURCE_GROUPS.html

Scripts would have to read an environment or input variable for non-default #GPUs=1

Describe the solution you'd like Switch from LOCKing to resource groups.

Describe alternatives you've considered None.

Additional context

Essential to make good use of multi-GPU nodes and allow e.g. efficient running of the performance tests on them.

Threads, MPI, and cpu cores could also be handled similarly but GPUs are most constrained.

prckent avatar Feb 25 '22 21:02 prckent

I added RESOURCE_GROUPS in QE https://gitlab.com/QEF/q-e/-/blob/develop/test-suite/CMakeLists.txt#L176 and multiple GPU can be configured via https://gitlab.com/QEF/q-e/-/blob/develop/test-suite/gpu-resource-example.json However I had one issue unresolved, when there is no resource file provided. It just runs all the tests without any resource constraints. Instead, we prefer it decays to running one test at a time.

ye-luo avatar Feb 25 '22 21:02 ye-luo

Hmm. If there is not a better solution, we could simply abort for GPU builds when the environment variable is not set and give the user the instructions to set it.

e.g. No --resource-spec-file given, abort but give a link to a basic one we include in our repo.

prckent avatar Feb 25 '22 21:02 prckent