cromwell
cromwell copied to clipboard
Outputs with a space in filename break on GCP
If a WDL task generates a file with a space in its name, and that file is an output, Cromwell fumbles the outputs and throws an error (at least on GCP-Terra!Cromwell). Additionally, this doesn't seem to be logged clearly.
This workflow takes in a bunch of BioSample accessions, downloads their associated run FASTQs, and processes them. https://dockstore.org/workflows/github.com/aofarrel/myco/myco_sra:4.1.2?tab=files
During one run, I accidentally passed in a file of BioSample accessions which had two spaces before each accession, eg
SAMEA104027315
SAMEA104027345
SAMEA104027406
SAMEA104164787
SAMEA104172469
SAMEA104172474
SAMEA104172508
SAMEA104221066
SAMEA104362398
SAMEA104394395
SAMEA104394505
SAMEA104414628
SAMEA104446901
The workflow is scattered per BioSample, so one instance of the scattered task takes in SAMEA104027315
as the input biosample_accession
(type String). The task writes a file like this:
echo "~{biosample_accession}" >> ~{biosample_accession}_pull_results.txt
eg SAMEA104027315_pull_results.txt
The workflow output section contains:
String results = read_string("~{biosample_accession}_pull_results.txt")
eg SAMEA104027315_pull_results.txt
, same as what's in the command section.
In the task level logs, I see
2023/04/18 21:54:34 Starting delocalization.
2023/04/18 21:54:35 Delocalization script execution started...
2023/04/18 21:54:35 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/memory_retry_rc
2023/04/18 21:54:37 Delocalizing output /cromwell_root/rc -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/rc
2023/04/18 21:54:39 Delocalizing output /cromwell_root/stdout -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/stdout
2023/04/18 21:54:40 Delocalizing output /cromwell_root/stderr -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/stderr
2023/04/18 21:54:42 Delocalizing output /cromwell_root/glob-db248e3bce81b54f5ef521878fe9e9de -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/glob-db248e3bce81b54f5ef521878fe9e9de/
2023/04/18 21:55:01 Delocalizing output /cromwell_root/glob-db248e3bce81b54f5ef521878fe9e9de.list -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/glob-db248e3bce81b54f5ef521878fe9e9de.list
2023/04/18 21:55:03 Delocalizing output /cromwell_root/ SAMEA104027315_pull_results.txt -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/ SAMEA104027315_pull_results.txt
2023/04/18 21:55:04 Delocalizing output /cromwell_root/SAMEA104027315.tar -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/SAMEA104027315.tar
2023/04/18 21:55:04 Delocalization script execution complete.
2023/04/18 21:55:05 Done delocalization.
In Job Manager, an error with the outputs can be seen.
data:image/s3,"s3://crabby-images/03fba/03fba889cd97268993085cded2b0f33deb726ffd" alt="job outputs"
Because Job Manager breaks on large scatters, and to save money on compute credits, I decided to stop the workflow early rather than let it keep going to find out if the workflow log would eventually show an errors. So far, it seems to have considered everything a success.
2023-04-18 21:59:54,599 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:108:1]: Status change from Running to Success
2023-04-18 22:00:09,060 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:107:1]: Status change from Running to Success
2023-04-18 22:00:18,464 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:106:1]: Status change from Running to Success
2023-04-18 22:01:20,604 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:111:1]: Status change from Running to Success
2023-04-18 22:14:47,728 INFO - WorkflowExecutionActor-10fa31a8-acbe-4ab7-a96a-6550ec08df12 [UUID(10fa31a8)]: Aborting workflow
2023-04-18 22:14:47,729 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:262:1]: PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8):myco.pull:262:1] Aborted StandardAsyncJob(projects/16371921765/locations/us-central1/operations/9178938377659283430)
2023-04-18 22:14:47,729 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:112:1]: PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8):myco.pull:112:1] Aborted StandardAsyncJob(projects/16371921765/locations/us-central1/operations/8559201934542591362)
2023-04-18 22:14:48,295 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:262:1]: Successfully requested cancellation of projects/16371921765/locations/us-central1/operations/9178938377659283430
2023-04-18 22:15:56,564 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:112:1]: Status change from Running to Success
2023-04-18 22:16:44,505 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:262:1]: Status change from Running to Cancelled
2023-04-18 22:16:44,539 INFO - WorkflowExecutionActor-10fa31a8-acbe-4ab7-a96a-6550ec08df12 [UUID(10fa31a8)]: WorkflowExecutionActor [UUID(10fa31a8)] aborted: myco.pull:262:1
2023-04-18 22:16:45,159 INFO - $f [UUID(10fa31a8)]: Copying workflow logs from /cromwell-workflow-logs/workflow.10fa31a8-acbe-4ab7-a96a-6550ec08df12.log to gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/workflow.logs/workflow.10fa31a8-acbe-4ab7-a96a-6550ec08df12.log