dsub icon indicating copy to clipboard operation
dsub copied to clipboard

Grouping tasks to run on the same VM instance

Open rivershah opened this issue 5 years ago • 3 comments

I am submitting task lists with more than 1K elements, with each individual task in the task list taking 1-2 minutes to run. That works fine, however it can be fairly inefficient given that VM orchestration time, pulling container images from private repo etc all consume a large fraction of time and bandwidth relative to individual task run time.

I wanted to know if there is a way I can request dsub to keep reusing the same VM / image and execute groups of 10 or more tasks before releasing the VM. SGE has a task grouping feature for large, short running array jobs. Could we do something similar with dsub please?

Example requested usage:

dsub \
  ...
  --tasks /tmp/dsub/tmp3prgdww1_job/tasks.tsv \ # very big array job with 1-2 minute run times per task
  --task-grouping 20 \ # reuse same VM for 20 tasks
  --max-concurrency 100 \ # do not have more than 100 VMs 
  --preemptible 3 \
  --retries 3 \
  --min-ram xxx \
  --min-cores xxx \
  --timeout 1h \
  --wait   

rivershah avatar Nov 11 '20 16:11 rivershah

I am in a similar situation!

gsneha26 avatar Sep 25 '21 07:09 gsneha26

Would be very nice...

slagelwa avatar Nov 17 '21 04:11 slagelwa

@RiverShah, @gsneha26, @slagelwa: Maybe I'm thinking this is too obvious, but why don't you combine groups of 20 files into a self-indexable file as one input file/output file, and let the script loop through it and select the correct index as necessary.

Hope it helps, ~p

pgrosu avatar Nov 17 '21 04:11 pgrosu