pipelines-api-examples
pipelines-api-examples copied to clipboard
How to deal with output directories
Hello,
When running a job with "gcloud alpha genomics pipelines run", I have an output that is a couple different directories... /mnt/data/output/A /mnt/data/output/B
Is there any way to copy the directories A and B to my GCS without naming all files?
It fails because the pipelines tries: gsutil /mnt/data/output/* gs://my_bucket
Similar to the samtools example yaml, I have: outputParameters:
- name: outputPath description: Cloud Storage path for where bamtofastq writes localCopy: path: output/* disk: datadisk
And:
gcloud alpha genomics pipelines run
--pipeline-file my.yaml
--inputs bamfiles.bam
--outputs outputPath=gs://cgc_bam_bucket_007/output/ \
I was thinking that in the docker cmd: >, the output dir could be tarred up, and then the output is just a tarball. But it's not a great solution.
Please help?
Hi @Gibbsdavidl !
I would recommend that you use dsub. I think it will provide a better experience than the gcloud
command-line, including having support for wildcards and recursive inputs and outputs.
See https://github.com/DataBiosphere/dsub#working-with-input-and-output-files-and-folders.
Hey there!!
Good call. I'm already having a better time... so much easier for what I want. It's really come a long ways (in terms of development)!
-dave