software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Notes on building stack for NVIDIA Grace & Grace/Hopper

Open trz42 opened this issue 8 months ago • 7 comments

Issue for keeping track of what to build, in which order (coordinate among builders) and document potential issues. In general, we will follow efforts done by @bedroge building a new stack for Sapphire Rapids.

Current approach

  • use the most recent EasyBuild version before v5.0.0 (to avoid being affected by breaking changes)
  • collapse easystack files into one using v4.9.4
  • remove from-pr for PRs being included with v4.9.4 or earlier
  • replace from-pr with from-commit

Outdated (by 2025-03-26)

NOTE, we need to be careful with from-pr. It might be that the ec being used is not what we expect. It seems it is much better to use the latest commit of the PR.

Idea to approach the from-pr issue:

  • for software that was originally built with EB < 4.9.2 we will nevertheless use EB 4.9.2
  • we will always add from-commit and include-easyblocks-from-commit
  • if neither from-pr or include-easyblocks-from-pr was used we use the commit hashes corresponding to the release of the EB version originally being used
  • if *from-pr was used we will replace it with *from-commit and the most recent commit in the PR

Attempt 1 (removed 2025-03-25)

Kickstarting the stack

  • [x] add EasyBuild (grace PR: #968; sapphirerapids PR: #921)
    • we did not use --from-pr with #968

Below we list further installations with the toolchain system:

  • [x] add ReFrame 4.6.2, Java 11 & NextFlow (grace PR #975; sapphirerapids PR: #928)
    • we skipped ReFrame/4.3.3 for now, because it failed due to relying on pip as install method but we are not going to use it (we will use ReFrame/4.6.2 or newer); also see https://github.com/EESSI/software-layer/pull/975#issuecomment-2746066013
    • we used from-pr: 19172 for Nextflow, however the easyconfig available from /cvmfs is identical to the one on GitHub in PR 19172 -> no need to rebuild Nextflow

Attempt 2

  • [x] add EasyBuild 4.8.2, 4.9.{0,1,2,3,4}, 5.0.0; ReFrame 4.3.3, 4.6.2, Java 11 & NextFlow 23.10.0 (grace PR: #981; sapphirerapids PRs: #921 & 928)

We will continue with the toolchains foss/2023b and foss/2023a and document progress in comments progress with foss/2023b and progress with foss/2023a, respectively. (later we may also look into foss/2022b)

trz42 avatar Mar 13 '25 18:03 trz42