batchtools icon indicating copy to clipboard operation
batchtools copied to clipboard

any plans to support dependencies between jobs?

Open tdhock opened this issue 5 years ago • 8 comments

Hi I'm interested in using batchtools but after looking at the documentation I'm not sure if batchtools has support for dependencies between jobs, which is a key feature that I would need. It is documented for SLURM on https://slurm.schedmd.com/job_array.html

e.g

# Wait for entire job array to complete successfully
sbatch --depend=afterok:123 my.job

If batchtools does support dependencies, where are the docs?

If not, how hard would it be to implement?

tdhock avatar Sep 25 '18 23:09 tdhock

hello @berndbischl @arfon @timflutre @mllg

tdhock avatar Sep 28 '18 15:09 tdhock

well @mllg really should answer here..... but my 2cents:

a) no this is is not supported, maybe you can hack something in, but it its supported in a cool an general way b) it was one of the first general big issues i opened up for batchjobs quite some time ago. this is something that would really take bt to the next level IMHO

but stuff like that is usually not that simple to implement

berndbischl avatar Sep 28 '18 15:09 berndbischl

if some of us are here, can we maybe at least, before we jump to solution specify what we want? how would a cool system for this look like?

berndbischl avatar Sep 28 '18 16:09 berndbischl

As @berndbischl said, it is not yet supported. A simple version would not be too hard to implement. It all depends on the interface you need. What would be relatively easy to write is the following:

  1. You define jobs as usual with batchMap().
  2. Get the table of all jobs you want to submit, e.g. ids = findNotSubmitted().
  3. Add an integer column depends.on. This is either NA (no deps) or a valid job id. Send to submitJobs().
  4. submitJobs() needs to first submit all jobs with depends.on == NA. Wait until all these jobs have been submitted to Slurm, as you need the slurm job id as returned by sbatch in the database.
  5. Adjust the resources to add "depend=afterok:xx" and submit all jobs whose dependencies are already submitted. Repeat until all jobs submitted.

mllg avatar Oct 02 '18 08:10 mllg

what it make sense - at some point, as this would be more complicated i guess - to look at a combo with drake?

berndbischl avatar Oct 02 '18 21:10 berndbischl

hi @mllg thanks for the idea to use depend=afterok:xx in resources. in fact I could probably do this in the current version of batchtools, as long as I use one register per step, right?

reg1=makeRegistry("~/registry/1")
reg2=makeRegistry("~/registry/2")
batchMap(fun = Step1, 1:10, reg=reg1)
batchMap(fun = Step2, "FOO", reg=reg2)
jobs <- getJobTable(reg=reg1)
chunks <- data.table(jobs, chunk=1)
submitJobs(chunks, resources = list(
  walltime = 3600, memory = 1024, ncpus=1, ntasks=1,
  chunks.as.arrayjobs=TRUE),
  reg=reg1)
jobs.done <- getJobTable()
job.id <- sub("_.*", "", jobs.done$batch.id)[[1]]
submitJobs(resources = list(
  walltime = 3600, memory = 1024, ncpus=1, ntasks=1,
  afterok=job.id
), reg=reg2)

I added the following line to slurm-simple.tmpl:

<%= if (!is.null(resources$afterok)) paste0("#SBATCH --depend=afterok:", resources$afterok) %>

Do you think that is an OK approach?

For me it seems a bit cumbersome to have to create one registry per step...

tdhock avatar Nov 08 '18 23:11 tdhock

Wouldn't it make more sense to explicitly not try this in batchtools and use a workflow tool for job dependencies, like drake?

mschubert avatar Feb 14 '19 12:02 mschubert

Wouldn't it make more sense to explicitly not try this in batchtools and use a workflow tool for job dependencies, like drake?

For more complex scenarios and to ensure portability between batch systems: yes. But as outlined above, it is note that difficult to implement. You just need a topo-sort, e.g. from https://github.com/mlr-org/mlr3misc/blob/master/R/topo_sort.R.

mllg avatar Feb 18 '19 10:02 mllg