pipelines-api-examples
pipelines-api-examples copied to clipboard
Continuing our discussion for simplifying pipelines (and their examples)
Hi Matt (@mbookman),
So to continue our discussion from https://github.com/googlegenomics/pipelines-api-examples/pull/10#discussion_r55108121, I understand the REST interface here:
https://www.googleapis.com/discovery/v1/apis/genomics/v1alpha2/rest
But this is too cumbersome for bioinformaticians who just want a turn-key solution and to run stuff. The examples are great, but we should have secondary ones to simplify them, which will increase the audience spectrum. This includes the ability for multiple files. This can be done now, even if the backend does not support it directly. Also include examples of connected pipelines as workflows and nested pipelines examples - and yes, there are several ways :)
So with each example there should be pipelines like this, which are defined in a file that the program (Python/R/Java, etc) will pick up and adapt to the REST interface. Here one provides only the necessary information, and the parser will transform the generalized names and also fill out the required on it's own:
Pipeline:
name: 'fastqc'
CPU: 1
RAM: 3.75 GB
disks:
name: 'datadisk'
mountPoint: '/mnt/data'
size: 500 GB
persistent: true
docker:
image: 'gcr.io/PROJECT_ID_ARGUMENT/fastqc'
cmd: ( 'mkdir /mnt/data/output && '
'fastqc /mnt/data/input/* --outdir=/mnt/data/output/' )
inputParameters:
name: inputFile + [idx : 1...len(INPUT)]
location:
path: 'input/'
disk: 'datadisk'
outputParameters:
name: 'outputPath'
location:
path: 'output/*'
disk: 'datadisk'
pipelineArgs:
RAM: 1 GB
disks:
name: 'datadisk'
size: DISK_SIZE_ARGUMENT
persistent: true
inputs:
inputFile + [idx : 1...len(INPUT)]
outputs:
path: OUTPUT_ARGUMENT
logging:
path: LOGGING_ARGUMENT
Let me know what you think.
Thanks, Paul