workflow-execution-service-schemas
workflow-execution-service-schemas copied to clipboard
Support and standard keyword for WES
Currently, WES supports standard keywords for CWL
and WDL
workflow languages. Can we add a new workflow_type
keyword supporting Nextflow-based workflows, perhaps NEXTFLOW
?
Perhaps we are thinking about this incorrectly. Jupyterlab notebooks can be translated from R, ipynb, etc. using the tool jupytext Kernels are specified when running jupyterlab notebooks using rendering tools such as papermill. We should say that a standard workflow specification could be articulated and stored in Common Workflow Language (as does the EU with their Go-FAIR plans and as stated on the Elixir site) but that we have translators that translate the CWL to Nextflow, WDL, Snakemake etc. Separating out concerns solves the problem and evolved best practices would be similar to the literate programming achieved with jupyterlab notebooks rendered in ipynb - allowing for the normal marked down section and then the execution cell.
What i suggest is that we have a language that is specified for storage. Most of these allow for processes to be specified, Nextflow and CWL, with inputs, outputs and the process specified with bash level scripting for the execution portion. That process should be a container where just that process is contained within it. This lends itself to continuous testing, etc.
Execution can then be specified easily and this lends itself to the automation we seek.
Converting CWL to WDL and vice versa is at best imperfect. There's been some recent workwhich is exciting but I believe it only goes in one direction. My understanding is that Nextflow is moving in a direction that makes direct translation harder, not easier - and there's not been much work in that regard in the first place.
If someone wants to contribute a universal translator between these standards, then I agree w/ your viewpoint.
Just for reference:
PRs to include Nextflow and Snakemake in the corresponding TRS enum:
- Nextflow (listed as
NFL
): ga4gh/tool-registry-service-schemas/pull/77 - Snakemake (listed as
SMK
): ga4gh/tool-registry-service-schemas/pull/163 & ga4gh/tool-registry-service-schemas/pull/200
I'm reading through the API Spec for the WES POST /runs
endpoint, specifically the workflow_type
parameter description:
The workflow_type is the type of workflow language and must be "CWL" or "WDL" currently (or another alternative supported by this WES instance).
It seems like this workflow_type
parameter for the POST /runs
endpoint is not an enum. I guess (as indicated in the above comment) this is really a change to the TRS spec to support "NFL" as a descriptor type?
In the WES GET /service-info
endpoint, there is a workflow_type_versions
attribute which returns "Available workflow types supported by a given instance of the service". I would think this provides the acceptable values to use in the workflow_type
field of the POST /runs
endpoint, leaving it up to the implementation to indicate available "workflow_types", rather than the WES spec.
@caylward-ilmn this may be a case of imprecise language in the documentation. Higher up in the reference docs it also says:
(currently CWL or WDL formatted workflows, other types may be supported in the future)
Which seems to imply that CWL or WDL are the only supported languages currently
@patmagee Sure - you're referring to the Executive Summary section (though it DOES say "other types may be supported in the future").
Regardless, that's a non-functional change to fix ancillary docs. Seemed like this issue was suggesting a change to the API specification? That I don't see...
@caylward-ilmn that caveat (or another alternative supported by this WES instance) does seem to appear everywhere in the docs already, so it may just practically be treated as an enum, when in reality CWL
and WDL
are just strings that can be anything.
It seems there already is in the specification the flexibility to define any workflow language as "supported" through the workflow_type
field. Because of that I do not think adding NEXTFLOW
as a documented language changes the specification from the way it currently operates. Issues around input harmonization, and whether adding NEXTFLOW
changes the semantics of the API are clearly already present by allowing any language that a WES instance supports
Worth mentioning nextflow is already implemented by Tower API, and it's expecting NFL
as language type for consistency with TRS
https://tower.nf/openapi/index.html#post-/ga4gh/wes/v1/runs
If NFL
is already used in TRS I do no see any reason to add separate terminology with NEXTFLOW
, and we should probably just use the NFL
name
I agree, wasn't aware of NFL
in TRS till now
If the docs/specs are to be changed to mention supported workflow languages explicitly, then I think it would be nice if it were the same set of languages supported in TRS. From v2.0.1
onward this will be:
- CWL
- WDL
- NFL
- Galaxy
- SMK
See here.
I am not sure whether the WES specs should include such an enumeration at all, though.
@uniqueg i think it definitely makes sense to keep language consistent between apis. NFL
makes a lot of sense for this.
I am not sure what the process of getting additional languages is supported is. My guess is eventually WES will support snake make, and galaxy as well and the two lists would likely be similar.
Actually, I'm not quite sure what exactly it means for WES to support a workflow language - which is why I'm not sure whether languages should even be listed explicitly at all. First of all, I think that WES specs should be abstract enough to enable a wide range of engines to implement them and provide at least a basic feature set. But assuming there are cases where this does not hold, I think it's primarily the engine that would cause compatibility problems, not the language itself. There might be engines where an otherwise "supported language" is not a good match for WES. But I find it quite hard to imagine a case where the language itself would cause problems. As long as a workflow can be addressed by a URI, it should be able to be processed by a suitable engine, or not?
So what are the criteria for a language to be supported by WES? If you mean there should be implementations for a candidate language, then Snakemake should certainly be listed, as there are at least two implementations available that I know of:
- https://github.com/sapporo-wes/sapporo
- https://gitlab.com/one-touch-pipeline/weskit/api
Of course, these implementations cannot strictly comply with the specs if the language isn't listed.... ;-)
My suggestion would be to list the languages (or better yet the engines, or both) that have been demonstrated to play nice with WES in the documentation, not the specs. This would lower the barrier for workflow engine developers to implement the specs. I feel similarly about TRS, by the way. Let the developers themselves interpret the specs for their use case (language, engine) and decide whether the specs are a good match or not...
@uniqueg I think that is a good Idea. I was also wondering if instead of WES explicitly stating the languages that it supports, to instead describe what general features a language should require to be a candidate for WES. Similar to your suggestion above this lowers the bar and allows off spec languages
We could add a section something like the following (this is not an actual suggestion on the wording btw):
WES can be used for any workflow language that adheres to the following general principals
1. A workflow should accept inputs
2. A single workflow should output a set of outputs
3. A workflow should not require an interactive session with the user but can be executed in a completely headless way
I like that! :) Though we will need to check which requirements WES actually really has from the RunRequest
model and maybe other more hidden limitations.
Speaking of the RunRequest
schema: any particular reason why the required
property isn't actually being used to allow automatic validation?