reframe icon indicating copy to clipboard operation
reframe copied to clipboard

Enable running some commands before the bootstrap, similar to `prerun_cmds` in regular tests

Open casparvl opened this issue 1 year ago • 2 comments

The current job generated by the remote CPU detection script is:

#!/bin/bash
#SBATCH --job-name="rfm-detect-job"
#SBATCH --ntasks=1
#SBATCH --output=rfm-detect-job.out
#SBATCH --error=rfm-detect-job.err

_onerror()
{
    exitcode=$?
    echo "-reframe: command \`$BASH_COMMAND' failed (exit code: $exitcode)"
    exit $exitcode
}

trap _onerror ERR

./bootstrap.sh
mpirun -np 1 ./bin/reframe --detect-host-topology=topo.json

There is no option for the user to change this in any way currently, even though regular reframe tests have to option to define prerun_cmds to e.g. make some module environment available.

When I hit the issue in https://github.com/reframe-hpc/reframe/issues/2926 I found that I had quite few options for debugging. E.g. I wanted to provide a different pip through a virtualenv, to see if that was my problem. Something like prerun_cmds would enable one to do that. Admittedly, it's not a very concrete use case. Hence, @vkarak suggested to make this separate issue and wait for a more concrete use case before implementing it.

So, if someone also runs into a situation where having a prerun_cmds in the CPU autodetect script would have been useful to them: please let us know here :)

casparvl avatar Sep 01 '23 11:09 casparvl

I would add to this that we should also allow extra bootstrap options to be passed.

vkarak avatar Nov 03 '23 13:11 vkarak

Log file(s) saved in '/home/satishk/reframe_CI_runs/logs/reframe_20240312_163919.log'
Run tests:
Detecting topology of remote partition 'snellius:gpu': this may take some time...
WARNING: failed to retrieve remote processor info: command 'sbatch rfm-detect-job.sh' failed with exit code 1:
--- stdout ---
--- stdout ---
--- stderr ---
sbatch: error: You should request at least one GPU when running a job in a GPU partition. Use --gpus-per-node=<number> or any of the other available options.
sbatch: error: Batch job submission failed: Unspecified error

@vkarak Is it possible to add additional #SBATCH arguments for the CPU auto detection?

satishskamath avatar Mar 12 '24 15:03 satishskamath