pipelines-api-examples icon indicating copy to clipboard operation
pipelines-api-examples copied to clipboard

How to deal with output directories

Open Gibbsdavidl opened this issue 5 years ago • 2 comments

Hello,

When running a job with "gcloud alpha genomics pipelines run", I have an output that is a couple different directories... /mnt/data/output/A /mnt/data/output/B

Is there any way to copy the directories A and B to my GCS without naming all files?

It fails because the pipelines tries: gsutil /mnt/data/output/* gs://my_bucket

Similar to the samtools example yaml, I have: outputParameters:

  • name: outputPath description: Cloud Storage path for where bamtofastq writes localCopy: path: output/* disk: datadisk

And: gcloud alpha genomics pipelines run
--pipeline-file my.yaml
--inputs bamfiles.bam
--outputs outputPath=gs://cgc_bam_bucket_007/output/ \

I was thinking that in the docker cmd: >, the output dir could be tarred up, and then the output is just a tarball. But it's not a great solution.

Please help?

Gibbsdavidl avatar Aug 08 '19 00:08 Gibbsdavidl

Hi @Gibbsdavidl !

I would recommend that you use dsub. I think it will provide a better experience than the gcloud command-line, including having support for wildcards and recursive inputs and outputs.

See https://github.com/DataBiosphere/dsub#working-with-input-and-output-files-and-folders.

mbookman avatar Aug 08 '19 00:08 mbookman

Hey there!!

Good call. I'm already having a better time... so much easier for what I want. It's really come a long ways (in terms of development)!

-dave

Gibbsdavidl avatar Aug 08 '19 18:08 Gibbsdavidl