common-workflow-language icon indicating copy to clipboard operation
common-workflow-language copied to clipboard

Directly wrapping Python/Ruby/Perl functions in CWL

Open jpellman opened this issue 5 years ago • 3 comments

Apologies if this point is addressed elsewhere and I've missed a previous issue / listserv post. If there's a past discussion of this somewhere that addresses this, feel free to point me to where it has occurred.

In DigDag, it appears that you can directly use individual Python/Ruby functions as nodes in a workflow (see here). While this is currently possible in CWL by combining a Command Line Tool with a wrapper script using Python's argparse, it is a bit unwieldy / not as seamless as being able to directly specify in the workflow that you want to invoke a specific Python function. Are there any plans in the works to create a "Python Tool" or "Ruby Tool" specification for CWL so that end-users of CWL-based tools wouldn't need to go through the extra step of wrapping Python/Ruby functions with argparse?

jpellman avatar Mar 25 '19 17:03 jpellman

There are no current plans for succinct PythonTool or RubyTools in CWL, however one could implement that as a vendor specific extension and provide a "polyfill" style tool to convert them to plain CommandLineTools. If we see adoption of this succinct syntax then it could enter a future CWL standard release.

In the mean time, there is a simpler way that doesn't involve using argparse: direct interpolation of input values into the Python/Ruby script source:

https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl/blob/cac44f2cf14110fde9951161c663c4525772f616/tools/discard_short_seqs.cwl#L30

The embedded scripts can even generate non-File outputs (strings, numbers, or complex custom types) directly using the cwl.output.json feature: https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl/blob/cac44f2cf14110fde9951161c663c4525772f616/tools/ipr_stats.cwl#L56

Building upon the above examples, a PythonTool or RubyTool (BashTool, etc..) succinct syntax could pre-populate an inputs object to match the CWL inputs and likewise automatically take a returned dictionary and serialize it as the cwl.output.json for consumption by the workflow/tool runner.

I think this would be very popular!

mr-c avatar Mar 26 '19 09:03 mr-c

This sounds like a great Google Summer of Code project, so if @jpellman or anyone else would be willing to mentor it, then please make a PR against https://github.com/OBF/GSoC/blob/gh-pages/00_ideas.md as soon as possible to recruit students.

mr-c avatar Mar 26 '19 09:03 mr-c

My summer's unfortunately a little busy so I don't think I'd be a very good mentor, but if I find anyone who would be willing to mentor a PythonTool / RubyTool I'll redirect them to this issue.

jpellman avatar Mar 28 '19 03:03 jpellman