common-workflow-language
common-workflow-language copied to clipboard
Directly wrapping Python/Ruby/Perl functions in CWL
Apologies if this point is addressed elsewhere and I've missed a previous issue / listserv post. If there's a past discussion of this somewhere that addresses this, feel free to point me to where it has occurred.
In DigDag, it appears that you can directly use individual Python/Ruby functions as nodes in a workflow (see here). While this is currently possible in CWL by combining a Command Line Tool with a wrapper script using Python's argparse, it is a bit unwieldy / not as seamless as being able to directly specify in the workflow that you want to invoke a specific Python function. Are there any plans in the works to create a "Python Tool" or "Ruby Tool" specification for CWL so that end-users of CWL-based tools wouldn't need to go through the extra step of wrapping Python/Ruby functions with argparse?
There are no current plans for succinct PythonTool
or RubyTool
s in CWL, however one could implement that as a vendor specific extension and provide a "polyfill" style tool to convert them to plain CommandLineTool
s. If we see adoption of this succinct syntax then it could enter a future CWL standard release.
In the mean time, there is a simpler way that doesn't involve using argparse
: direct interpolation of input values into the Python/Ruby script source:
https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl/blob/cac44f2cf14110fde9951161c663c4525772f616/tools/discard_short_seqs.cwl#L30
The embedded scripts can even generate non-File
outputs (strings, numbers, or complex custom types) directly using the cwl.output.json
feature: https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl/blob/cac44f2cf14110fde9951161c663c4525772f616/tools/ipr_stats.cwl#L56
Building upon the above examples, a PythonTool
or RubyTool
(BashTool
, etc..) succinct syntax could pre-populate an inputs
object to match the CWL inputs and likewise automatically take a returned dictionary and serialize it as the cwl.output.json
for consumption by the workflow/tool runner.
I think this would be very popular!
This sounds like a great Google Summer of Code project, so if @jpellman or anyone else would be willing to mentor it, then please make a PR against https://github.com/OBF/GSoC/blob/gh-pages/00_ideas.md as soon as possible to recruit students.
My summer's unfortunately a little busy so I don't think I'd be a very good mentor, but if I find anyone who would be willing to mentor a PythonTool
/ RubyTool
I'll redirect them to this issue.