shellutils icon indicating copy to clipboard operation
shellutils copied to clipboard

feature: add possibility to inject functions that will be called before and/or after each job

Open Fruchix opened this issue 4 months ago • 0 comments

Description

I've added the possibility to "inject" two different functions into _job_pool_worker. One will be run before the job begins, and the other one will be run after. These functions are optional (defaults to ""), but if provided will be called for each job.

Motivation

This improves the job management, as the function job_pool_wait waits for ALL queued jobs to finish. We can inject functions that add new jobs according to how/which previous jobs finished, kill all jobs if one fails, add our own logging, etc. (and this, dynamically, without waiting for all queued jobs to finish). I think there is a wide range of possibilities.

Implementation

I added two new parameters to job_pool_init:

  • job_pool_function_pre_job (optional) a function that will be called before each job
  • job_pool_function_post_job (optional) a function that will be called after each job.

Those variables can't be local variables, passed as parameters to _job_pool_start_workers then _job_pool_worker as job_pool_wait would need to pass those two functions again to _job_pool_start_workers. Those functions are aware of _job_pool_worker's local variables and job_pool.sh's global variables, allowing us to use them (see code sample).

Further implementation?

I think job_pool_wait could accept parameters to redefine the two functions after waiting for each previous job to finish, but it is just an idea and I didn't investigate more in that direction.

Code Sample

Sample program demonstrating function injection
#!/bin/bash

. job_pool.sh

#####################################################
# Demonstration of function injection into each job #
#####################################################

echo "Demonstration of function injection into each job:"

# sleep some time ($1) then echo something ($2)
function sleep_n_echo()
{
    sleep "$1"
    echo "$2"
}

# Injected function that will be called before each job
# 
# Print which worker is starting which job
function print_starting_job()
{
    echo " # _job_pool_worker-${id}: Starting job: ${cmd} $(echo "${args[@]}" | xargs | tr '\v' ' ')"
}

# Injected function that will be called afetr each job
# 
# Kill all workers if the local variable "result" from _job_pool_worker
# indicates that the job failed
function kill_workers()
{
    echo " # _job_pool_worker-${id}: Finished job: ${cmd} $(echo "${args[@]}" | xargs | tr '\v' ' ')"

    # result is undefined in this script, but will be defined when
    # the function is injected in _job_pool_worker
    if [[ "${result}" != "0" ]]; then
        # get the pids of all workers:
        # - each worker's process is named after the current script (here, job_pool_sample.sh),
        #   so we use this name to get the pids
        # - we do not include the current script's pid ($$) as it is not a worker,
        #   (we do not want to kill the script itself, only the workers)
        local workers_pids=("$(pgrep -f "$0" | grep -v $$)")
        kill ${workers_pids[@]} &> /dev/null &
    fi
}

# allow 3 parallel jobs, and kill all jobs at the first fail using "kill_workers" function
job_pool_init 3 0 print_starting_job kill_workers

# simulate 3 jobs, where one fails before the others are finished, and interrupts the others
job_pool_run sleep_n_echo 3 a   # job 1
job_pool_run /bin/false         # job 2
job_pool_run sleep_n_echo 3 b   # job 3

# the job 2 will kill all other running workers, using the function "kill_workers"
# (that is ran after processing each job)

job_pool_shutdown

echo -e "\nOnly the failed job exited, the others did not because they were canceled."

Fruchix avatar Oct 30 '24 22:10 Fruchix