openpbs icon indicating copy to clipboard operation
openpbs copied to clipboard

Allow running jobs on overlapping maintenance reservations

Open kiliakis opened this issue 1 year ago • 0 comments

Describe Bug or Feature

The issue arises when a job is submitted to a maintenance reservation that overlaps with another prior maintenance reservation. Jobs submitted to both reservations should be allowed to run if the reservation resources are not allocated to other jobs.

Describe Your Change

To fix this, a short-circuit condition was added in the scheduler's node_info::is_vnode_eligible() method, that does not exclude the vnode from running the request if the job is in a maintenance reservation.

Link to Design Doc

Attach Test and Valgrind Logs/Output

The new functionality can be verified with a simple bash script:


#!/bin/bash

current_time_plus1=$(date -d "1 minute" +"%H%M")
duration="02:00"
hosts=$(hostname)

# query nodes status
echo -en "Node status before reservations: "
pbsnodes -avSj

# submit first advance reservation
resv_id_1=$(pbs_rsub -R $current_time_plus1 -D $duration --hosts $hosts)
resv_id_1=$(echo $resv_id_1 | cut -d'.' -f1)

# submit second advance reservation
resv_id_2=$(pbs_rsub -R $current_time_plus1 -D $duration --hosts $hosts)
resv_id_2=$(echo $resv_id_2 | cut -d'.' -f1)

# wait until reservations start
/bin/sleep 60

# query reservations
pbs_rstat 

# submit a job that takes all resources in second queue
job_id_2=$(qsub -q $resv_id_2 -lselect=1 -lplace=excl -o job2.txt -e job2.txt -- /bin/sleep 30)

# check job info
qstat -was1 $job_id_2

running_jobs=$(qselect -s R)

# normally you should see that job is running
if [[ $running_jobs == *"$job_id_2"* ]]; then
    echo "Jobs in overlapping maintenance reservations are running!"
else
    echo "Cannot run job in overlapping maintenance reservation."
fi


  • The output of this script before the fix was:

Screenshot 2024-02-29 115058

You see that the job is in Q state, despite the host being free.

  • After allowing executing jobs on hosts allocated to overlapping maintenance reservations:

Screenshot 2024-02-29 114617

Same scenario, but now the job goes immediately in R state, since there is a free vnode to run it.

kiliakis avatar Feb 29 '24 10:02 kiliakis