cwl-ts icon indicating copy to clipboard operation
cwl-ts copied to clipboard

Example creating workflow programmatically

Open esanzgar opened this issue 7 years ago • 6 comments

Please, could you provide an example of how to create a workflow from scratch?

It would be nice if it contained some of these:

  • Add workflow's input and output
  • Add a couple of steps with its inputs and outputs
  • Edit label and doc
  • Serialize

Thanks!

esanzgar avatar Oct 16 '17 15:10 esanzgar

Is the following approach correct?

I have two issues with serialization:

  • requirements are not serialized
  • all inputs and outputs are single parameters, but they are serialized as arrays. That causes me to use linkMerge option and MultipleInputFeatureRequirement.
import {
    V1WorkflowModel,
    V1StepModel,
    V1WorkflowInputParameterModel,
    V1WorkflowOutputParameterModel
} from 'cwlts/models/v1.0';
import {
    RequirementBaseModel
} from 'cwlts/models/generic';

export function createWorkflow(){
        const wf = new V1WorkflowModel();
        wf.label = 'My label';
        wf.description = 'My doc'; // It is serialized as 'doc'
        wf.requirements.push(new RequirementBaseModel({class: 'SubworkflowFeatureRequirement'}));

        // Add workflow inputs
        const inputs = new V1WorkflowInputParameterModel({
            id: 'protein',
            label: 'UniProt ID',
            doc: 'Enter UniProt identifier',
            type: 'string?',
            default: 'uniprot:P01038'
        });
        wf.addEntry(inputs, 'inputs');

        // Add two steps
        const step1 = new V1StepModel({
            id: 'sss',
            label: 'NCBI BLAST',
            doc: 'Sequence similarity search',
            in: {
                sequence: 'protein'
            },
            out: ['proteins'],
            run: 'https://raw.githubusercontent.com/psafont/gluetools-cwl/master/ncbiblast/ncbiblast.cwl'
        });
        wf.addEntry(step1, 'steps');

        const step2 = new V1StepModel({
            id: 'filter',
            label: 'Top 20 sequences',
            doc: 'Use DbFetch to get the 20 top most similar sequences',
            in: {
                accessions: 'sss/proteins'
            },
            out: ['sequences'],
            run: 'https://raw.githubusercontent.com/psafont/gluetools-cwl/master/workflows/fetch-proteins.cwl'
        });
        wf.addEntry(step2, 'steps');

        // Add workflow outputs
        const outputs = new V1WorkflowOutputParameterModel({
            id: 'result',
            label: 'Filtered sequences',
            doc: 'Top X sequences',
            type: 'File',
            outputSource: 'filter/sequences'
        });
        wf.addEntry(outputs, 'outputs');

        // wf.serialize()
        return wf;
}

esanzgar avatar Oct 17 '17 16:10 esanzgar

Hi @esanzgar, sorry for the late reply.

I will admit the API behind the workflow model isn't the prettiest or most consistent, I've mostly been developing it to satisfy the needs of the Rabix Composer. The lack of documentation is also unfortunately.

There are specific ways in which the Composer creates a WorkflowModel which aren't the easiest/most convenient to replicate programmatically. The philosophy behind workflow creation in the composer is as follows:

  • Workflow creation starts with adding steps
  • Steps are added as resolved tools (the whole object, not just the path) with references to their location for later serialization, calling the method addStepFromProcess
  • Step inputs and outputs are generated from the step's run property (this is why the model needs the whole tool/workflow instead of just the path)
  • Workflow inputs and outputs are created from ports on the step, calling the methods createInputFromPort and createOutputFromPort
  • Direct manipulation of objects on the model is avoided in favor of helper methods, as they ensure a consistent state of the model's graph, validation tree and validity.

That being said, the example you show could also work. The issue you have with requirements is actually a bug, as we haven't had a need for adding/serializing requirements in the Composer so the functionality was never added.

I'm not sure I understand the issue related to serializing inputs and outputs, though. Workflow.inputs and Workflow.outputs are always serialized as an array out of habit, they could easily be a map<id, input> as this is just a syntax sugar. linkMerge and MultipleInputFeatureRequirement are only necessary when you have multiple incoming connections on a single step, which serialization does not affect.

mayacoda avatar Oct 18 '17 12:10 mayacoda

Maya,

Thank you for your reply.

Would you mind posting an example of a Composer workflow creation approach (with the addStepFromProcess, createInputFromPort and createOutputFromPort)?

esanzgar avatar Oct 23 '17 14:10 esanzgar

Regarding the potential bug (serialising requirements), would you like me to create an independent issue?

esanzgar avatar Oct 23 '17 14:10 esanzgar

Sorry, my explanation about MultipleInputFeatureRequirement was not accurate.

If I define the input of a step in this way:

            in: {
                accessions: 'sss/proteins'
            },

It is serialized in this way:

        "in": [{
            "id": "accessions",
            "source": ["sss/proteins"]
        }],

However, I was expecting this:

        "in": [{
            "id": "accessions",
            "source": "sss/proteins"
        }],

Because source is an array I have to add the requirement MultipleInputFeatureRequirement (linkMerge defaults to "merge_nested") to make it work.

Because there is a problem serialising requirements I am in a little predicament.

Thanks

esanzgar avatar Oct 23 '17 14:10 esanzgar

Workaround to serialise requirements:

        // Standard way of adding requirements doesn't work
        // wf.requirements.push(new RequirementBaseModel({class: 'SubworkflowFeatureRequirement'}));
        // wf.requirements.push(new RequirementBaseModel({class: 'MultipleInputFeatureRequirement'}));

        // Workaround
        wf.customProps.requirements = []
        wf.customProps.requirements.push({class: 'SubworkflowFeatureRequirement'});
        wf.customProps.requirements.push({class: 'MultipleInputFeatureRequirement'});

esanzgar avatar Oct 23 '17 16:10 esanzgar