openpbs
openpbs copied to clipboard
Allow running jobs on overlapping maintenance reservations
Describe Bug or Feature
The issue arises when a job is submitted to a maintenance reservation that overlaps with another prior maintenance reservation. Jobs submitted to both reservations should be allowed to run if the reservation resources are not allocated to other jobs.
Describe Your Change
To fix this, a short-circuit condition was added in the scheduler's node_info::is_vnode_eligible()
method, that does not exclude the vnode from running the request if the job is in a maintenance reservation.
Link to Design Doc
Attach Test and Valgrind Logs/Output
The new functionality can be verified with a simple bash script:
#!/bin/bash
current_time_plus1=$(date -d "1 minute" +"%H%M")
duration="02:00"
hosts=$(hostname)
# query nodes status
echo -en "Node status before reservations: "
pbsnodes -avSj
# submit first advance reservation
resv_id_1=$(pbs_rsub -R $current_time_plus1 -D $duration --hosts $hosts)
resv_id_1=$(echo $resv_id_1 | cut -d'.' -f1)
# submit second advance reservation
resv_id_2=$(pbs_rsub -R $current_time_plus1 -D $duration --hosts $hosts)
resv_id_2=$(echo $resv_id_2 | cut -d'.' -f1)
# wait until reservations start
/bin/sleep 60
# query reservations
pbs_rstat
# submit a job that takes all resources in second queue
job_id_2=$(qsub -q $resv_id_2 -lselect=1 -lplace=excl -o job2.txt -e job2.txt -- /bin/sleep 30)
# check job info
qstat -was1 $job_id_2
running_jobs=$(qselect -s R)
# normally you should see that job is running
if [[ $running_jobs == *"$job_id_2"* ]]; then
echo "Jobs in overlapping maintenance reservations are running!"
else
echo "Cannot run job in overlapping maintenance reservation."
fi
- The output of this script before the fix was:
You see that the job is in Q state, despite the host being free.
- After allowing executing jobs on hosts allocated to overlapping maintenance reservations:
Same scenario, but now the job goes immediately in R state, since there is a free vnode to run it.