benchpark
benchpark copied to clipboard
Use Ramble modifier to fill in allocation variables
Closes https://github.com/LLNL/benchpark/pull/178
Given an experiment that requests resources (nodes, cpus, gpus, etc.) and a system description (cpus-per-node, gpus-per-node, etc.) this intends to generate an appropriate scheduler request for resources. In some cases that ends up determining things like how many nodes are desired for a given benchmark.
https://github.com/LLNL/benchpark/pull/178#issuecomment-2008221116 brings up some more-interesting examples like these, and this PR is an alternative approach.
This requires a newer Ramble than what Benchpark currently uses by default (right now I'm using https://github.com/GoogleCloudPlatform/ramble/pull/452
).
./bin/benchpark setup saxpy/openmp nosite-x86_64 `pwd`/test-saxpy
Remaining work:
- [x] remove all experiment-specific
execute_experiment.tpl
files - [x] Implement scheduler definition function for Sierra and Fugaku
- [x] Update all experiment files and all system config files (currently just
experiments/saxpy/openmp
andconfigs/nosite-x86_64
are changed to demonstrate the organization) - [ ] All
experiment/*/*/ramble.yaml
files have been translated, but need further updates to actually describe system resources (e.g. number of GPUs on each node etc.)- (April 22 2024) All LLNL systems are now updated with # of CPUs/GPUs per-node (in the latter case, only for systems that have them)
- (April 23 2024) All Systems except Eiger are now updated (note that LUMI and Daint have partitions with different types of nodes, and currently the variables only describe one type)
- [x] (May 14 2024) Update CI to do a
ramble workspace setup --dry-run
of some configs and experiments: this actually runs the modifier defined here to generate batch scripts etc. with all resource requests filled in
Testing:
You can run any one of the following on any system
./bin/benchpark setup saxpy/openmp nosite-x86_64 <basedir>
./bin/benchpark setup amg2023/cuda LLNL-Sierra-IBM-power9-V100-Infiniband <basedir>
./bin/benchpark setup amg2023/cuda LLNL-Pascal-Penguin-broadwell-P100-OmniPath <basedir>
For the ramble workspace setup
command it tells you to run, just append --phases make_experiments
to the end of it (that will skip the concretize/install steps).
Oddities:
- (April 23 2024) LUMI/Daint nodes have different characteristics based on what partition you request. For now, I only describe one type of node. I think we can handle this in the future by creating different configs based on what partition the user wants to submit to.
- (April 12 2024) The GROMACS
execute_experiment.tpl
files are slightly different than the others: they have an extra{experiment_setup}
; everything sets that variable to''
though, so I don't see a problem with removing them as well - (EDIT: now resolved) ~Some values must be defined before the modifier runs, e.g. n_ranks. I've arbitrarily decided the placeholder value for these is "7" (they must be positive integers, so I decided to choose a number that was (a) unlikely to be explicitly chosen and (b) small (in case they percolate to actual requests)~
Example script generated from experiments/saxpy/openmp
and configs/nosite-x86_64
(tweaked to assume slurm, for a more interesting output):
#SBATCH -n 8
#SBATCH -N 1
#SBATCH --time 120
cd <benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2
rm -f "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"
touch "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"
export OMP_NUM_THREADS="2";
. <benchpark-prefix>/test-saxpy-oslic-new/spack/share/spack/setup-env.sh
spack env activate <benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/software/saxpy.problem
srun -n 8 -N 1 saxpy -n 512 >> "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"