easybuild-framework icon indicating copy to clipboard operation
easybuild-framework copied to clipboard

Slurm JobID parsing failed.

Open hattom opened this issue 3 years ago • 4 comments

$ eb NVHPC-22.7-CUDA-11.7.0.eb --job --job-backend=Slurm --job-max-walltime=4 --job-cores=4
== Temporary log file in case of crash /scratch_local/eb-4ztatilm/easybuild-7qbf3o78.log
== found valid index for /m100_work/FUSIO_ru6IPP_0/thayward/eb/software/EasyBuild/4.6.0/easybuild/easyconfigs, so using it...
ERROR: Failed to determine job ID from output of submission command: sbatch: no partition specified, using default partition m100_all_serial
Submitted batch job 7787202

I guess it tries to parse sbatch: no partition specified, using default partition m100_all_serial to get the jobid?

I guess the --parsable flag to sbatch might help?

hattom avatar Aug 16 '22 12:08 hattom

Looks like the useful info is going to stdout in both cases, and the stderr message is causing the problem.

[thayward@login03 NVHPC]$ sbatch -n 1 --parsable tmp.sh
sbatch: no partition specified, using default partition m100_all_serial
7787232
[thayward@login03 NVHPC]$ sbatch -n 1 --parsable tmp.sh 2> /dev/null
7787233
[thayward@login03 NVHPC]$ sbatch -n 1 tmp.sh
sbatch: no partition specified, using default partition m100_all_serial
Submitted batch job 7787234
[thayward@login03 NVHPC]$ sbatch -n 1 tmp.sh 2> /dev/null
Submitted batch job 7787235

hattom avatar Aug 16 '22 12:08 hattom

For the time being, I can work around this with environment variables:

(export SBATCH_PARTITION=m100_all_serial; sbatch -n 1 tmp.sh)
Submitted batch job 7787260

hattom avatar Aug 16 '22 12:08 hattom

That is from https://github.com/easybuilders/easybuild-framework/blob/d7409b3d68516a0bf70688d14e5750dedfbb8758/easybuild/tools/job/slurm.py#L119

jobid_regex = re.compile("^Submitted batch job (?P<jobid>[0-9]+)")

My instinct is to try removing the ^ from the regex.

branfosj avatar Aug 19 '22 13:08 branfosj

Sounds reasonable. In the absence of being able to parse only the stdout, I guess that would be the only easy fix?

hattom avatar Aug 19 '22 13:08 hattom