cwl-ts
cwl-ts copied to clipboard
Example creating workflow programmatically
Please, could you provide an example of how to create a workflow from scratch?
It would be nice if it contained some of these:
- Add workflow's input and output
- Add a couple of steps with its inputs and outputs
- Edit label and doc
- Serialize
Thanks!
Is the following approach correct?
I have two issues with serialization:
- requirements are not serialized
- all inputs and outputs are single parameters, but they are serialized as arrays. That causes me to use linkMerge option and MultipleInputFeatureRequirement.
import {
V1WorkflowModel,
V1StepModel,
V1WorkflowInputParameterModel,
V1WorkflowOutputParameterModel
} from 'cwlts/models/v1.0';
import {
RequirementBaseModel
} from 'cwlts/models/generic';
export function createWorkflow(){
const wf = new V1WorkflowModel();
wf.label = 'My label';
wf.description = 'My doc'; // It is serialized as 'doc'
wf.requirements.push(new RequirementBaseModel({class: 'SubworkflowFeatureRequirement'}));
// Add workflow inputs
const inputs = new V1WorkflowInputParameterModel({
id: 'protein',
label: 'UniProt ID',
doc: 'Enter UniProt identifier',
type: 'string?',
default: 'uniprot:P01038'
});
wf.addEntry(inputs, 'inputs');
// Add two steps
const step1 = new V1StepModel({
id: 'sss',
label: 'NCBI BLAST',
doc: 'Sequence similarity search',
in: {
sequence: 'protein'
},
out: ['proteins'],
run: 'https://raw.githubusercontent.com/psafont/gluetools-cwl/master/ncbiblast/ncbiblast.cwl'
});
wf.addEntry(step1, 'steps');
const step2 = new V1StepModel({
id: 'filter',
label: 'Top 20 sequences',
doc: 'Use DbFetch to get the 20 top most similar sequences',
in: {
accessions: 'sss/proteins'
},
out: ['sequences'],
run: 'https://raw.githubusercontent.com/psafont/gluetools-cwl/master/workflows/fetch-proteins.cwl'
});
wf.addEntry(step2, 'steps');
// Add workflow outputs
const outputs = new V1WorkflowOutputParameterModel({
id: 'result',
label: 'Filtered sequences',
doc: 'Top X sequences',
type: 'File',
outputSource: 'filter/sequences'
});
wf.addEntry(outputs, 'outputs');
// wf.serialize()
return wf;
}
Hi @esanzgar, sorry for the late reply.
I will admit the API behind the workflow model isn't the prettiest or most consistent, I've mostly been developing it to satisfy the needs of the Rabix Composer. The lack of documentation is also unfortunately.
There are specific ways in which the Composer creates a WorkflowModel which aren't the easiest/most convenient to replicate programmatically. The philosophy behind workflow creation in the composer is as follows:
- Workflow creation starts with adding steps
- Steps are added as resolved tools (the whole object, not just the path) with references to their location for later serialization, calling the method
addStepFromProcess
- Step inputs and outputs are generated from the step's run property (this is why the model needs the whole tool/workflow instead of just the path)
- Workflow inputs and outputs are created from ports on the step, calling the methods
createInputFromPort
andcreateOutputFromPort
- Direct manipulation of objects on the model is avoided in favor of helper methods, as they ensure a consistent state of the model's graph, validation tree and validity.
That being said, the example you show could also work. The issue you have with requirements is actually a bug, as we haven't had a need for adding/serializing requirements in the Composer so the functionality was never added.
I'm not sure I understand the issue related to serializing inputs and outputs, though. Workflow.inputs and Workflow.outputs are always serialized as an array out of habit, they could easily be a map<id, input> as this is just a syntax sugar. linkMerge
and MultipleInputFeatureRequirement
are only necessary when you have multiple incoming connections on a single step, which serialization does not affect.
Maya,
Thank you for your reply.
Would you mind posting an example of a Composer workflow creation approach (with the addStepFromProcess, createInputFromPort and createOutputFromPort)?
Regarding the potential bug (serialising requirements), would you like me to create an independent issue?
Sorry, my explanation about MultipleInputFeatureRequirement was not accurate.
If I define the input of a step in this way:
in: {
accessions: 'sss/proteins'
},
It is serialized in this way:
"in": [{
"id": "accessions",
"source": ["sss/proteins"]
}],
However, I was expecting this:
"in": [{
"id": "accessions",
"source": "sss/proteins"
}],
Because source is an array I have to add the requirement MultipleInputFeatureRequirement (linkMerge defaults to "merge_nested") to make it work.
Because there is a problem serialising requirements I am in a little predicament.
Thanks
Workaround to serialise requirements:
// Standard way of adding requirements doesn't work
// wf.requirements.push(new RequirementBaseModel({class: 'SubworkflowFeatureRequirement'}));
// wf.requirements.push(new RequirementBaseModel({class: 'MultipleInputFeatureRequirement'}));
// Workaround
wf.customProps.requirements = []
wf.customProps.requirements.push({class: 'SubworkflowFeatureRequirement'});
wf.customProps.requirements.push({class: 'MultipleInputFeatureRequirement'});