Hi I typically use a loop to launch jobs that can all run concurrently and are not dependent on each other. I would like to use the retries flag to kick off the independent jobs that fail. This does not seem to work. Is there a solution to my problem Example Code: %%bash --out LINE_COUNT_JOB_ID

Get a shorter username to leave more characters for the job name.

DSUB_USER_NAME="$(echo "${OWNER_EMAIL}" | cut -d@ -f1)"

For AoU RWB projects network name is "network".

AOU_NETWORK=network AOU_SUBNETWORK=subnetwork

MACHINE_TYPE="n2-standard-4" BASH_SCRIPT="gs://fc-secure-cb192ac6-30ba-46b9-92ee-896a6e36c63e/dsub/hpoisner/snplist_step1/SNPlist_step1_mac75k.sh" LOWER=1 UPPER=23 for ((chromo=$LOWER;chromo<$UPPER;chromo+=1)) do dsub
--provider google-cls-v2
--user-project "${GOOGLE_PROJECT}"
--project "${GOOGLE_PROJECT}"
--image "marketplace.gcr.io/google/ubuntu1804:latest"
--network "${AOU_NETWORK}"
--subnetwork "${AOU_SUBNETWORK}"
--service-account "$(gcloud config get-value account)"
--user "${DSUB_USER_NAME}"
--regions us-central1
--logging "${WORKSPACE_BUCKET}/dsub/v7/logs/{job-name}/{user-id}/$(date +'%Y%m%d/%H%M%S')/{job-id}-{task-id}-{task-attempt}.log"
"$@"
--preemptible
--retries 2
--wait
--boot-disk-size 1000
--machine-type ${MACHINE_TYPE}
--name "${JOB_NAME}"
--script "${BASH_SCRIPT}"
--env GOOGLE_PROJECT=${GOOGLE_PROJECT}
--input plink=""
--input bgen_file=""
--input sample_file=""
--env chrom=${chromo}
--output-recursive OUTPUT_PATH="${OUTPUT_FILES}/${chromo}" done

Oct 16 '23 18:10 hpoisner

Hi @hpoisner, you mention that does not seem to work, but can you please describe what you do observe to be happening? Are there any error messages? Any relevant logging? Any output that would indicate that a retry is not happening?

Oct 16 '23 20:10 wnojopra

The issue is it turns jobs that should run in parallel into sequential jobs. There aren't any specific error messages. We just want to run multiple jobs at once with the capacity to retry

Oct 16 '23 21:10 hpoisner

I see you're doing a loop over the chromosomes, and each call to dsub has a --wait flag. This means that each chromosome will wait to completion before going on to the next.

To run the jobs in parallel, instead you'll want to define a tasks TSV file where each line is a different chromosome. See https://github.com/DataBiosphere/dsub#submitting-a-batch-job for details on the tasks file format and the --tasks flag.

Oct 17 '23 00:10 wnojopra

dsub
dsub copied to clipboard

Multiple jobs with retries

Get a shorter username to leave more characters for the job name.

For AoU RWB projects network name is "network".

dsub dsub copied to clipboard

Multiple jobs with retries

Get a shorter username to leave more characters for the job name.

For AoU RWB projects network name is "network".

dsub
dsub copied to clipboard