cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Outputs with a space in filename break on GCP

Open aofarrel opened this issue 1 year ago • 0 comments

If a WDL task generates a file with a space in its name, and that file is an output, Cromwell fumbles the outputs and throws an error (at least on GCP-Terra!Cromwell). Additionally, this doesn't seem to be logged clearly.

This workflow takes in a bunch of BioSample accessions, downloads their associated run FASTQs, and processes them. https://dockstore.org/workflows/github.com/aofarrel/myco/myco_sra:4.1.2?tab=files

During one run, I accidentally passed in a file of BioSample accessions which had two spaces before each accession, eg

  SAMEA104027315
  SAMEA104027345
  SAMEA104027406
  SAMEA104164787
  SAMEA104172469
  SAMEA104172474
  SAMEA104172508
  SAMEA104221066
  SAMEA104362398
  SAMEA104394395
  SAMEA104394505
  SAMEA104414628
  SAMEA104446901

The workflow is scattered per BioSample, so one instance of the scattered task takes in SAMEA104027315 as the input biosample_accession (type String). The task writes a file like this:

echo "~{biosample_accession}" >> ~{biosample_accession}_pull_results.txt

eg SAMEA104027315_pull_results.txt

The workflow output section contains:

String results = read_string("~{biosample_accession}_pull_results.txt")

eg SAMEA104027315_pull_results.txt, same as what's in the command section.

In the task level logs, I see

2023/04/18 21:54:34 Starting delocalization.
2023/04/18 21:54:35 Delocalization script execution started...
2023/04/18 21:54:35 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/memory_retry_rc
2023/04/18 21:54:37 Delocalizing output /cromwell_root/rc -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/rc
2023/04/18 21:54:39 Delocalizing output /cromwell_root/stdout -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/stdout
2023/04/18 21:54:40 Delocalizing output /cromwell_root/stderr -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/stderr
2023/04/18 21:54:42 Delocalizing output /cromwell_root/glob-db248e3bce81b54f5ef521878fe9e9de -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/glob-db248e3bce81b54f5ef521878fe9e9de/
2023/04/18 21:55:01 Delocalizing output /cromwell_root/glob-db248e3bce81b54f5ef521878fe9e9de.list -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/glob-db248e3bce81b54f5ef521878fe9e9de.list
2023/04/18 21:55:03 Delocalizing output /cromwell_root/  SAMEA104027315_pull_results.txt -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/  SAMEA104027315_pull_results.txt
2023/04/18 21:55:04 Delocalizing output /cromwell_root/SAMEA104027315.tar -> gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/myco/10fa31a8-acbe-4ab7-a96a-6550ec08df12/call-pull/shard-0/SAMEA104027315.tar
2023/04/18 21:55:04 Delocalization script execution complete.
2023/04/18 21:55:05 Done delocalization.

In Job Manager, an error with the outputs can be seen.

job outputs

Because Job Manager breaks on large scatters, and to save money on compute credits, I decided to stop the workflow early rather than let it keep going to find out if the workflow log would eventually show an errors. So far, it seems to have considered everything a success.

2023-04-18 21:59:54,599 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:108:1]: Status change from Running to Success
2023-04-18 22:00:09,060 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:107:1]: Status change from Running to Success
2023-04-18 22:00:18,464 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:106:1]: Status change from Running to Success
2023-04-18 22:01:20,604 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:111:1]: Status change from Running to Success
2023-04-18 22:14:47,728 INFO  - WorkflowExecutionActor-10fa31a8-acbe-4ab7-a96a-6550ec08df12 [UUID(10fa31a8)]: Aborting workflow
2023-04-18 22:14:47,729 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:262:1]: PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8):myco.pull:262:1] Aborted StandardAsyncJob(projects/16371921765/locations/us-central1/operations/9178938377659283430)
2023-04-18 22:14:47,729 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:112:1]: PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8):myco.pull:112:1] Aborted StandardAsyncJob(projects/16371921765/locations/us-central1/operations/8559201934542591362)
2023-04-18 22:14:48,295 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:262:1]: Successfully requested cancellation of projects/16371921765/locations/us-central1/operations/9178938377659283430
2023-04-18 22:15:56,564 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:112:1]: Status change from Running to Success
2023-04-18 22:16:44,505 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(10fa31a8)myco.pull:262:1]: Status change from Running to Cancelled
2023-04-18 22:16:44,539 INFO  - WorkflowExecutionActor-10fa31a8-acbe-4ab7-a96a-6550ec08df12 [UUID(10fa31a8)]: WorkflowExecutionActor [UUID(10fa31a8)] aborted: myco.pull:262:1
2023-04-18 22:16:45,159 INFO  - $f [UUID(10fa31a8)]: Copying workflow logs from /cromwell-workflow-logs/workflow.10fa31a8-acbe-4ab7-a96a-6550ec08df12.log to gs://fc-caa84e5a-8ef7-434e-af9c-feaf6366a042/submissions/93bf6971-bfa1-4cb8-bb22-c8a753f58c49/workflow.logs/workflow.10fa31a8-acbe-4ab7-a96a-6550ec08df12.log

aofarrel avatar Apr 19 '23 01:04 aofarrel