easybuild-easyconfigs icon indicating copy to clipboard operation
easybuild-easyconfigs copied to clipboard

{toolchain} nvompic v2021a

Open SebastianAchilles opened this issue 3 years ago • 13 comments

(created using eb --new-pr)

Depends on

  • [x] https://github.com/easybuilders/easybuild-framework/pull/3735 ~- [ ] CUDA for 2021a (no PR yet)`~

SebastianAchilles avatar Jun 10 '21 15:06 SebastianAchilles

@boegelbot please test @ generoso

SebastianAchilles avatar Jun 18 '21 15:06 SebastianAchilles

@SebastianAchilles: Request for testing this PR well received on generoso

PR test command 'EB_PR=13107 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_13107 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17561

Test results coming soon (I hope)...

- notification for comment with ID 864102480 processed

Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).

boegelbot avatar Jun 18 '21 15:06 boegelbot

Test report by @sebastianachilles SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in total) rocky8-eb - Linux rocky linux 8.4, x86_64, Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz (broadwell), Python 3.6.8 See https://gist.github.com/0c883508f1f2cd69a3e26171c5ad7e5a for a full test report.

SebastianAchilles avatar Jun 18 '21 15:06 SebastianAchilles

Test report by @boegelbot SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in total) generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8 See https://gist.github.com/821a2dc8a5040f0eb5ec0ffd7f5c359f for a full test report.

boegelbot avatar Jun 18 '21 15:06 boegelbot

Test report by @sebastianachilles SUCCESS Build succeeded for 4 out of 4 (2 easyconfigs in total) centos8-eb - Linux centos linux 8.3.2011, x86_64, Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (skylake), Python 3.6.8 See https://gist.github.com/c21c48a9300d461c88bddaa92edb15ea for a full test report.

SebastianAchilles avatar Jun 18 '21 15:06 SebastianAchilles

@boegelbot Please test @ generoso

akesandgren avatar Jun 28 '21 11:06 akesandgren

@akesandgren: Request for testing this PR well received on generoso

PR test command 'EB_PR=13107 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_13107 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17633

Test results coming soon (I hope)...

- notification for comment with ID 869590094 processed

Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).

boegelbot avatar Jun 28 '21 11:06 boegelbot

This should eventually use UCX-CUDA when that stuff is completely in place (which should be soon)

akesandgren avatar Jun 28 '21 11:06 akesandgren

Test report by @boegelbot SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in total) generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8 See https://gist.github.com/345570077cd9633950e5efd903f214a7 for a full test report.

boegelbot avatar Jun 28 '21 11:06 boegelbot

And add a HPL and OSU-Micro-Benchmarks built on top of this so we can test it.

akesandgren avatar Jun 28 '21 11:06 akesandgren

This is now tripping up in tools/module_naming_scheme/hierarchical_mns.py:det_modpath_extensions line 231

akesandgren avatar Jun 29 '21 12:06 akesandgren

@boegel I think we might need some guidance here... When it is trying to build CUDA-11.3.1-NVHPC-21.5.eb the hmns code sees CUDA and in det_modpath_extensions we have non_system_cuda = true, entering that if stmt, "if ec['name'] in extend_comps:" is true since CUDA is in at least one of the COMP_NAME_VERSION_TEMPLATES keys, same for "if ec['name'] in comp_names:", etc but eventually comp_name_ver will still be None since there is no 'CUDA,NVHPC' key. But if we're trying to move away from the compiler-CUDA modulepath extension, https://github.com/easybuilders/easybuild-easyconfigs/issues/12484 option 1, we need to do this differently. Either in this PR or in framework.

What is the right way forward here?

akesandgren avatar Jun 29 '21 12:06 akesandgren

Considering the recent foss/fosscuda merge changes (including renaming CUDAcore -> CUDA), we need to do similar things here;

  1. We use the system level CUDA package as a dependency. This allows us to use the UCX-CUDA plugin directly.
  2. Presumably, NVHPC shouldn't provide it's own CUDA on top then? Because it does right now by default.
  3. There won't be a second CUDA package, it's just NVHPC as the compiler level, and it already implies CUDA. There is no non-cuda variant like there used to be with GCC vs GCC-CUDA. I don't think we need to do anything special with modulepaths here. It's just NVHPC + OpenMPI to consider.

Micket avatar Sep 07 '21 12:09 Micket

Closing since we added a nvofbf/2022.07 toolchain in https://github.com/easybuilders/easybuild-easyconfigs/pull/16724

SebastianAchilles avatar Oct 23 '23 14:10 SebastianAchilles