dgenies icon indicating copy to clipboard operation
dgenies copied to clipboard

Reduce code complexity of job management

Open pbordron opened this issue 9 months ago • 0 comments

Until now jobs execution paths are different in standalone mode and in server mode and depending if the runner is local or use drmaa

It leads to many duplicate code and conditional structure (if ... then ... else ...) to manage each step of a jobs.

Job context

One of the main constraints is maintaining the job context. In server mode, we use the db in order to populate the job object attributes when job-scheduler run a job step. In stand alone mode, only one jobs is allowed and everything is in memory, then no job object attribute needs to be populated. Many conditional structure and function variation exist in job management code in order to manage this. We can reduce the code complexity if:

  • in server mode, we call the database with a lazy approach, but it may be difficult
  • or, in standalone mode, we fake database calls.

Moving some information from DB into a job serialized file in json can also be an approach to unify the job management. Database must be keept in order to manage jobs state in (async) server mode

Job scripts

When a job step is run through drmaa runner, dgenies will submit a job on cluster using one of the script available in src/dgenies/bin. However, if the job step is run with the local runner, it will be something else that will be used: a part of the scripts will be used as library (that is ok), but some huge part are copy-pasted from scripts into job steps with local runner.

A rework of job scripts must be reworked in order to reduce this complicated duplicated code.

Packaging scripts can help to manage this #68

pbordron avatar Oct 23 '23 12:10 pbordron