benchpark icon indicating copy to clipboard operation
benchpark copied to clipboard

Use Ramble modifier to fill in allocation variables

Open scheibelp opened this issue 11 months ago • 1 comments

Closes https://github.com/LLNL/benchpark/pull/178

Given an experiment that requests resources (nodes, cpus, gpus, etc.) and a system description (cpus-per-node, gpus-per-node, etc.) this intends to generate an appropriate scheduler request for resources. In some cases that ends up determining things like how many nodes are desired for a given benchmark.

https://github.com/LLNL/benchpark/pull/178#issuecomment-2008221116 brings up some more-interesting examples like these, and this PR is an alternative approach.

This requires a newer Ramble than what Benchpark currently uses by default (right now I'm using https://github.com/GoogleCloudPlatform/ramble/pull/452).

./bin/benchpark setup saxpy/openmp nosite-x86_64 `pwd`/test-saxpy

Remaining work:

  • [x] remove all experiment-specific execute_experiment.tpl files
  • [x] Implement scheduler definition function for Sierra and Fugaku
  • [x] Update all experiment files and all system config files (currently just experiments/saxpy/openmp and configs/nosite-x86_64 are changed to demonstrate the organization)
  • [ ] All experiment/*/*/ramble.yaml files have been translated, but need further updates to actually describe system resources (e.g. number of GPUs on each node etc.)
    • (April 22 2024) All LLNL systems are now updated with # of CPUs/GPUs per-node (in the latter case, only for systems that have them)
    • (April 23 2024) All Systems except Eiger are now updated (note that LUMI and Daint have partitions with different types of nodes, and currently the variables only describe one type)
  • [x] (May 14 2024) Update CI to do a ramble workspace setup --dry-run of some configs and experiments: this actually runs the modifier defined here to generate batch scripts etc. with all resource requests filled in

Testing:

You can run any one of the following on any system

./bin/benchpark setup saxpy/openmp nosite-x86_64 <basedir>
./bin/benchpark setup amg2023/cuda LLNL-Sierra-IBM-power9-V100-Infiniband <basedir>
./bin/benchpark setup amg2023/cuda LLNL-Pascal-Penguin-broadwell-P100-OmniPath <basedir>

For the ramble workspace setup command it tells you to run, just append --phases make_experiments to the end of it (that will skip the concretize/install steps).

Oddities:

  • (April 23 2024) LUMI/Daint nodes have different characteristics based on what partition you request. For now, I only describe one type of node. I think we can handle this in the future by creating different configs based on what partition the user wants to submit to.
  • (April 12 2024) The GROMACS execute_experiment.tpl files are slightly different than the others: they have an extra {experiment_setup}; everything sets that variable to '' though, so I don't see a problem with removing them as well
  • (EDIT: now resolved) ~Some values must be defined before the modifier runs, e.g. n_ranks. I've arbitrarily decided the placeholder value for these is "7" (they must be positive integers, so I decided to choose a number that was (a) unlikely to be explicitly chosen and (b) small (in case they percolate to actual requests)~

scheibelp avatar Apr 03 '24 06:04 scheibelp

Example script generated from experiments/saxpy/openmp and configs/nosite-x86_64 (tweaked to assume slurm, for a more interesting output):

#SBATCH -n 8
#SBATCH -N 1
#SBATCH --time 120

cd <benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2

rm -f "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"
touch "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"
export OMP_NUM_THREADS="2";
. <benchpark-prefix>/test-saxpy-oslic-new/spack/share/spack/setup-env.sh
spack env activate <benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/software/saxpy.problem
srun -n 8 -N 1 saxpy -n 512 >> "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"

scheibelp avatar Apr 04 '24 06:04 scheibelp