cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Cromwell 75 complains about GCS output file not found when delocalizing directories

Open freeseek opened this issue 3 years ago • 1 comments

$ echo 'version development

workflow main {
  call main { input: s1 = "x", s2 = "y" }
  output { Array[File] f = main.f }
}

task main {
  input {
    String s1
    String s2
  }

  command <<<
    set -euo pipefail
    mkdir d
    touch "d/~{s1}"
    touch "d/~{s2}"
    echo -e "d/~{s1}\nd/~{s2}"
  >>>

  output {
    Directory d = "d"
    Array[File] f = read_lines(stdout())
  }

  runtime {
    docker: "debian:stable-slim"
  }
}' > main.wdl

This workflow when run on Google Cloud using Cromwell 74:

$ java -Dconfig.file=PAPIv2.conf -jar cromwell-74.jar run main.wdl

will succeed.

When run on Google Cloud using Cromwell 75:

$ java -Dconfig.file=PAPIv2.conf -jar cromwell-75.jar run main.wdl

the workflow will fail with message:

GCS output file not found: gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d

However, the directory is correctly delocalized:

$ gsutil ls -l gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d
         0  2022-02-13T00:00:00Z  gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d/x
         0  2022-02-13T00:00:00Z  gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d/y
TOTAL: 2 objects, 0 bytes (0 B)

The delocalization script is aware that d is directory:

$ gsutil cat gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/gcs_delocalization.sh
source '/cromwell_root/gcs_transfer.sh'

timestamped_message 'Delocalization script execution started...'

# xxx
delocalize_6c578056c74a8d9a80724855ddac131c=(
  "mccarroll-mocha"       # project
  "3"   # max attempts
  "150M" # parallel composite upload threshold, will not be used for directory types
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/memory_retry_rc"
  "/cromwell_root/memory_retry_rc"
  "optional"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/rc"
  "/cromwell_root/rc"
  "required"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/monitoring.log"
  "/cromwell_root/monitoring.log"
  "required"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/stdout"
  "/cromwell_root/stdout"
  "required"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/stderr"
  "/cromwell_root/stderr"
  "required"
  "text/plain; charset=UTF-8"
  "directory"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d"
  "/cromwell_root/d"
  "required"
  ""
)

delocalize "${delocalize_6c578056c74a8d9a80724855ddac131c[@]}"
      
timestamped_message 'Delocalization script execution complete.'

But somehow a new check was included in Cromwell 75 that wants d to be a file even if it is delocalized as a directory.

This breaks the only workaround available in Cromwell to be able to delocalize a list of files not determined a priori before the start of the task. Notice that glob() is not an acceptable alternative as glob() does not provide control over the order of the output files.

freeseek avatar Feb 13 '22 05:02 freeseek

Hello,

I think the problem is solved in release 78 of Cromwell. I had this problem when running the mocha workflow at Cromwell server 74. After updating to 78 the workflow completed the problematic tasks.

  • Cromwell 74:
+--------------------+---------+------------+---------------------+
|        TASK        | ATTEMPT |  ELAPSED   |       STATUS        |
+--------------------+---------+------------+---------------------+
| batch_id_lines     | 1       | 5m34.003s  | Done                |
| batch_sorted_tsv   | 1       | 4m45.648s  | Done                |
| csv2bam (Scatter)  | -       | 10m51.838s | 1/1 Done | 0 Failed |
| green_idat_lines   | 1       | 5m34.003s  | Done                |
| gtc                | 1       | 5m27.897s  | Done                |
| gtc_reheader       | 1       | 5m26.257s  | Failed              |
| idat               | 1       | 5m27.897s  | Done                |
| idat2gtc (Scatter) | -       | 10m58.206s | 0/1 Done | 1 Failed |
| red_idat_lines     | 1       | 5m34.002s  | Done                |
| ref_scatter        | 1       | 4m39.394s  | Done                |
| sample_id_lines    | 1       | 5m34.003s  | Done                |
| sample_sorted_tsv  | 1       | 4m42.453s  | Done                |
+--------------------+---------+------------+---------------------+
❗You have 1 issue:

 - Workflow failed
 - GCS output file not found: gs://bioinfo-dev-temp/mocha/a224bb3e-fc20-4b0a-8846-ee2b4b603933/call-gtc_reheader/maps
 - GCS output file not found: gs://bioinfo-dev-temp/mocha/a224bb3e-fc20-4b0a-8846-ee2b4b603933/call-idat2gtc/shard-0/gtcs
  • Cromwell 78
+----------------------------+---------+-----------------+-----------------------+
|            TASK            | ATTEMPT |     ELAPSED     |        STATUS         |
+----------------------------+---------+-----------------+-----------------------+
| batch_id_lines             | 1       | 16.37s          | Done                  |
| batch_sorted_tsv           | 1       | 15.288s         | Done                  |
| call_rate_lines            | 1       | 5m34.525s       | Done                  |
| computed_gender_lines      | 1       | 5m34.523s       | Done                  |
| csv2bam (Scatter)          | -       | 49.958s         | 1/1 Done | 0 Failed   |
| flatten_sample_id_lines    | 1       | 5m29.56s        | Done                  |
| get_max_nrecords (Scatter) | -       | 5m32.076s       | 1/1 Done | 0 Failed   |
| green_idat_lines           | 1       | 16.38s          | Done                  |
| green_idat_tsv             | 1       | 5m33.602s       | Done                  |
| gtc                        | 1       | 10.602s         | Done                  |
| gtc2vcf (Scatter)          | -       | 8m15.392s       | 1/1 Done | 0 Failed   |
| gtc_reheader               | 1       | 4m16.907s       | Done                  |
| gtc_tsv                    | 1       | 5m30.578s       | Done                  |
| idat                       | 1       | 7.606s          | Done                  |
| idat2gtc (Scatter)         | -       | 9m46.928s       | 1/1 Done | 0 Failed   |
| mocha_calls_tsv            | 1       | 5m19.305941005s | Running               |
| mocha_stats_tsv            | 1       | 5m19.304938136s | Running               |
| red_idat_lines             | 1       | 16.386s         | Done                  |
| red_idat_tsv               | 1       | 5m33.603s       | Done                  |
| ref_scatter                | 1       | 17.728s         | Done                  |
| sample_id_lines            | 1       | 16.383s         | Done                  |
| sample_id_split_tsv        | 1       | 5m31.462s       | Done                  |
| sample_sorted_tsv          | 1       | 11.924s         | Done                  |
| sample_tsv                 | 1       | 5m26.14s        | Done                  |
| vcf_concat (Scatter)       | -       | 5m32.467s       | 1/1 Done | 0 Failed   |
| vcf_import (Scatter)       | -       | 8m16.609s       | 1/1 Done | 0 Failed   |
| vcf_merge (Scatter)        | -       | 2h6m53.926s     | 23/23 Done | 0 Failed |
| vcf_mocha (Scatter)        | -       | 8m19.96s        | 1/1 Done | 0 Failed   |
| vcf_phase (Scatter)        | -       | 3h7m39.033s     | 23/23 Done | 0 Failed |
| vcf_qc (Scatter)           | -       | 2h8m6.051s      | 23/23 Done | 0 Failed |
| vcf_scatter (Scatter)      | -       | 5m25.444s       | 1/1 Done | 0 Failed   |
| vcf_split (Scatter)        | -       | 2h7m37.183s     | 23/23 Done | 0 Failed |
| write_tsv                  | 1       | 5m10.124926865s | Running               |
| xcl_vcf_concat             | 1       | 5m28.883s       | Done                  |
+----------------------------+---------+-----------------+-----------------------+

note: some tasks has duration of few seconds because I'm using call cache.

lmtani avatar May 03 '22 20:05 lmtani