cromwell
cromwell copied to clipboard
Cromwell 75 complains about GCS output file not found when delocalizing directories
$ echo 'version development
workflow main {
call main { input: s1 = "x", s2 = "y" }
output { Array[File] f = main.f }
}
task main {
input {
String s1
String s2
}
command <<<
set -euo pipefail
mkdir d
touch "d/~{s1}"
touch "d/~{s2}"
echo -e "d/~{s1}\nd/~{s2}"
>>>
output {
Directory d = "d"
Array[File] f = read_lines(stdout())
}
runtime {
docker: "debian:stable-slim"
}
}' > main.wdl
This workflow when run on Google Cloud using Cromwell 74:
$ java -Dconfig.file=PAPIv2.conf -jar cromwell-74.jar run main.wdl
will succeed.
When run on Google Cloud using Cromwell 75:
$ java -Dconfig.file=PAPIv2.conf -jar cromwell-75.jar run main.wdl
the workflow will fail with message:
GCS output file not found: gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d
However, the directory is correctly delocalized:
$ gsutil ls -l gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d
0 2022-02-13T00:00:00Z gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d/x
0 2022-02-13T00:00:00Z gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d/y
TOTAL: 2 objects, 0 bytes (0 B)
The delocalization script is aware that d
is directory:
$ gsutil cat gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/gcs_delocalization.sh
source '/cromwell_root/gcs_transfer.sh'
timestamped_message 'Delocalization script execution started...'
# xxx
delocalize_6c578056c74a8d9a80724855ddac131c=(
"mccarroll-mocha" # project
"3" # max attempts
"150M" # parallel composite upload threshold, will not be used for directory types
"file"
"gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/memory_retry_rc"
"/cromwell_root/memory_retry_rc"
"optional"
"text/plain; charset=UTF-8"
"file"
"gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/rc"
"/cromwell_root/rc"
"required"
"text/plain; charset=UTF-8"
"file"
"gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/monitoring.log"
"/cromwell_root/monitoring.log"
"required"
"text/plain; charset=UTF-8"
"file"
"gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/stdout"
"/cromwell_root/stdout"
"required"
"text/plain; charset=UTF-8"
"file"
"gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/stderr"
"/cromwell_root/stderr"
"required"
"text/plain; charset=UTF-8"
"directory"
"gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d"
"/cromwell_root/d"
"required"
""
)
delocalize "${delocalize_6c578056c74a8d9a80724855ddac131c[@]}"
timestamped_message 'Delocalization script execution complete.'
But somehow a new check was included in Cromwell 75 that wants d
to be a file even if it is delocalized as a directory.
This breaks the only workaround available in Cromwell to be able to delocalize a list of files not determined a priori before the start of the task. Notice that glob()
is not an acceptable alternative as glob()
does not provide control over the order of the output files.
Hello,
I think the problem is solved in release 78 of Cromwell. I had this problem when running the mocha workflow at Cromwell server 74. After updating to 78 the workflow completed the problematic tasks.
- Cromwell 74:
+--------------------+---------+------------+---------------------+
| TASK | ATTEMPT | ELAPSED | STATUS |
+--------------------+---------+------------+---------------------+
| batch_id_lines | 1 | 5m34.003s | Done |
| batch_sorted_tsv | 1 | 4m45.648s | Done |
| csv2bam (Scatter) | - | 10m51.838s | 1/1 Done | 0 Failed |
| green_idat_lines | 1 | 5m34.003s | Done |
| gtc | 1 | 5m27.897s | Done |
| gtc_reheader | 1 | 5m26.257s | Failed |
| idat | 1 | 5m27.897s | Done |
| idat2gtc (Scatter) | - | 10m58.206s | 0/1 Done | 1 Failed |
| red_idat_lines | 1 | 5m34.002s | Done |
| ref_scatter | 1 | 4m39.394s | Done |
| sample_id_lines | 1 | 5m34.003s | Done |
| sample_sorted_tsv | 1 | 4m42.453s | Done |
+--------------------+---------+------------+---------------------+
❗You have 1 issue:
- Workflow failed
- GCS output file not found: gs://bioinfo-dev-temp/mocha/a224bb3e-fc20-4b0a-8846-ee2b4b603933/call-gtc_reheader/maps
- GCS output file not found: gs://bioinfo-dev-temp/mocha/a224bb3e-fc20-4b0a-8846-ee2b4b603933/call-idat2gtc/shard-0/gtcs
- Cromwell 78
+----------------------------+---------+-----------------+-----------------------+
| TASK | ATTEMPT | ELAPSED | STATUS |
+----------------------------+---------+-----------------+-----------------------+
| batch_id_lines | 1 | 16.37s | Done |
| batch_sorted_tsv | 1 | 15.288s | Done |
| call_rate_lines | 1 | 5m34.525s | Done |
| computed_gender_lines | 1 | 5m34.523s | Done |
| csv2bam (Scatter) | - | 49.958s | 1/1 Done | 0 Failed |
| flatten_sample_id_lines | 1 | 5m29.56s | Done |
| get_max_nrecords (Scatter) | - | 5m32.076s | 1/1 Done | 0 Failed |
| green_idat_lines | 1 | 16.38s | Done |
| green_idat_tsv | 1 | 5m33.602s | Done |
| gtc | 1 | 10.602s | Done |
| gtc2vcf (Scatter) | - | 8m15.392s | 1/1 Done | 0 Failed |
| gtc_reheader | 1 | 4m16.907s | Done |
| gtc_tsv | 1 | 5m30.578s | Done |
| idat | 1 | 7.606s | Done |
| idat2gtc (Scatter) | - | 9m46.928s | 1/1 Done | 0 Failed |
| mocha_calls_tsv | 1 | 5m19.305941005s | Running |
| mocha_stats_tsv | 1 | 5m19.304938136s | Running |
| red_idat_lines | 1 | 16.386s | Done |
| red_idat_tsv | 1 | 5m33.603s | Done |
| ref_scatter | 1 | 17.728s | Done |
| sample_id_lines | 1 | 16.383s | Done |
| sample_id_split_tsv | 1 | 5m31.462s | Done |
| sample_sorted_tsv | 1 | 11.924s | Done |
| sample_tsv | 1 | 5m26.14s | Done |
| vcf_concat (Scatter) | - | 5m32.467s | 1/1 Done | 0 Failed |
| vcf_import (Scatter) | - | 8m16.609s | 1/1 Done | 0 Failed |
| vcf_merge (Scatter) | - | 2h6m53.926s | 23/23 Done | 0 Failed |
| vcf_mocha (Scatter) | - | 8m19.96s | 1/1 Done | 0 Failed |
| vcf_phase (Scatter) | - | 3h7m39.033s | 23/23 Done | 0 Failed |
| vcf_qc (Scatter) | - | 2h8m6.051s | 23/23 Done | 0 Failed |
| vcf_scatter (Scatter) | - | 5m25.444s | 1/1 Done | 0 Failed |
| vcf_split (Scatter) | - | 2h7m37.183s | 23/23 Done | 0 Failed |
| write_tsv | 1 | 5m10.124926865s | Running |
| xcl_vcf_concat | 1 | 5m28.883s | Done |
+----------------------------+---------+-----------------+-----------------------+
note: some tasks has duration of few seconds because I'm using call cache.