workflow-execution-service-schemas
workflow-execution-service-schemas copied to clipboard
Add New Endpoint to Describe Workflow Params
The Problem
As the number of WES implementors increases, a common issue I have seen is the inability for a new consumer of a WES API to get up and running without inside knowledge of how the inputs need to be structured, or even what those inputs are. The following proposal adds a new endpoint to the WES specification: /describe-inputs
which would allow users to submit a workflow (url or attachment) to receive a detailed schema of how the inputs are expected to be structured.
In theory, all engine specific configuration should be factored out to the workflow_engine_parameters
section, and all workflows should support the exact same workflow inputs definition. Unfortunately, this theory does not always hold true. There are a number of WES engine which require non defined inputs (inputs not present in the workflow definition) to be included in the workflow_params
section. Additionally, there is variance even within a language on how a specific implementation expects inputs to be defined. WDL for example strictly requires all engines implement a JSON interface for submitting inputs, however the way the keys are defined, or even what keys are allowed is left to the implementation.
Because of these very real world issues, it is not enough to rely on TRS for defining a workflow inputs structure (not that it currently does this, but some would argue this would belong mostly to TRS). Such a structure would optimistically work, but practically falls short of defining the caveats that each execution engine contains.
Key outcomes:
- promote programatic usage of WES
- promote more interopability between WES engines with the same workflows
- Allow for building more feature rich UI's with dynamic input forms
- reduce the burden on the user by giving them a template to use for input param submission
GET /describe-params?workflow_url=foo.wdl
How does the server get the contents of foo.wdl
?
@coverbeck the current proposal uses the same semantics as the POST /runs
endpoint. if the URL is absolute (ie http://../foo.wdl
) then it would attempt to resolve the file for example using an Http resolver, however if the workflow_url
is relative, then it is expected that an workflow_attachment
has been sent with the current request with the given relative name
@coverbeck the current proposal uses the same semantics as the
POST /runs
endpoint. if the URL is absolute (iehttp://../foo.wdl
) then it would attempt to resolve the file for example using an Http resolver, however if theworkflow_url
is relative, then it is expected that anworkflow_attachment
has been sent with the current request with the given relative name
Does that mean it should be POST /describe-params
instead of the example GET /describe-params
?
@coverbeck you are correct, that is a good catch and I will update the specific example (it should be http://foo.com/bar.wdl). I believe I did provide a GET
endpoint which allows you to simply pass in a web accessible URL, but if you needd to actually supply your workflow then there is a POST
method for that
While I think that this is a very important issue, and I am in full support of any effort to make things easier (and workflow language-agnostic) for users, I would advise caution here, mostly because as it is, WES is pretty much open for all workflow languages, precisely because workflow_params
(and workflow_engine_params
) aren't strictly typed/defined. Change that and you may make it a lot harder for an engine or a whole language to play nice with WES.
The main question I have is: Will a WES instance even have a chance to be able to provide this information for all workflows (of supported language versions) thrown at it? Where does it get it from? For some languages it may be relatively easy to parse from the workflow itself, but this is certainly not the case for all languages. Or perhaps more generally: Should it be the responsibility of WES developers to provide this information? For workflow_engine_params
, I think the answer would certainly be yes. But in the case of workflow_params
, I would tend to think it is rather the workflow developers who are responsible to define those, if they want their workflows to be used in a cloud setting. If you agree, I think the schema (file) describing those params/inputs should rather be optionally provided to WES, along with the workflow file(s) itself, to allow a WES to validate inputs based on the schema, if provided. Rather than WES trying to parse/generate it from the workflow file(s), which it may or may not be able to do. So in other words: no need for /describe-inputs
, only for /describe-params
or /describe-engine-params
.
As far as I know, workflow languages are coming up with their own solutions to the problem of defining input/param schemas, and it seems that most of them are implementing some form of JSON schema-based solution (e.g., for Nextflow/nf-core, Snakemake). And CWL itself is already based on JSON schema and I think allows richly defining those info in the workflow itself.
So, I'm thinking, would it perhaps be better to try to reach out and bring together the different workflow language developer communities to get feedback and a consensus here?
Perhaps we could go ahead with the workflow_engine_params
already though, because to me at least that seems less controversial, as a given WES instance is clearly the responsible party to define and describe which engine params it support. As was mentioned by @vinjana though, this should account for the fact that some WES implementations cover different engines for different (or possibly even the same) languages and language versions.
Hi, I'm also very interested in this PR.
I think of workflow params as inputs + engine specific parameters -- and those can vary quite a bit between different implementations.
I disagree and think that workflow engine parameters and workflow inputs parameter from the workflow document should be separated.
So in other words: no need for /describe-inputs, only for /describe-params or /describe-engine-params.
I agree with @uniqueg, as a WES developer, workflow engine parameters are acceptable (I'd rather limit the accepted engine parameters as a security concern), but to parse the inputs parameter from a workflow document is very diffucult. In my case, I'm developing a WES that supports CWL, WDL, Nextflow, and Snakemake, but I have implemented inputs parser only for CWL, because the other languages require a language system. (https://github.com/suecharo/cwl-inputs-parser) Therefore, templates for the inputs parameter from workflow documents should be provided by the TRS layer (e.g. manually entered by the workflow registers) or the input parser should be provided by workflow language developer community.
Of course, for WESs that are somewhat limited in the workflows they can execute, it would be very useful to add an endpoint that can provide what workflows can be executed.
In our WES, we implemeted it as https://github.com/sapporo-wes/sapporo-service/blob/main/sapporo/executable_workflows.json and provided it in /service-info
endpoint.
For workflow_engine_parameters
I think we should be careful to keep this engine agnostic so that underlying implementation details don't leak out and confound how these parameters are specified. For example, if I'm talking to WES endpoint A that uses Cromwell and WES endpoint B that uses MiniWDL, how should I tell both to use caching for the workflows I run?
I don't want to do the following: engine params for A:
{
"write_to_cache": true,
"read_from_cache": true
}
engine params for B:
{
"call_cache": {
"put": true,
"get": true
},
"no-cache": false
}
What matters most is that the WES endpoint can run my workflow, not the specific engine it uses to do so. In the above case, I would much rather WES have a standard caching capability concept and a defined way to enable or disable it.
Overall, It would be good for WES to be prescriptive in this regard - i.e. compliant implementations MUST have a specific set of capabilities that meet the majority of workflow execution use cases.
I agree with @wleepang: a defined vocabulary of common capabilities would be great for interoperability.
However, I find that requiring every workflow engine to implement them is rather restrictive and might lead to considerable discussions for each entry into the set.
But how about we require each WES to broadcast in the service info which capabilities are supported? In this way, if we absolutely need, say, caching, we could make sure beforehand that the WES we intent to use supports that capability.
Even better/more economical would of course be if that information was propagated to the service registry, so we could do the following with a single request: give me a list of WES instances that know how to handle, say, CWL 1.2 workflows and support caching.
As this issue keeps on popping up in various discussions, I thought that it might be good to move this forward. So is there, perhaps, a consensus that:
-
workflow_engine_params
should accept a controlled (yet-to-be-defined) vocabulary of capabitlities that WES implementations then internally map to the corresponding params of a given engine - rich metadata describing inputs and outputs, while important, is not a concern of WES and should happen elsewhere (e.g., TRS, new spec)
If so we can re-focus on 1. in here (or perhaps better in a new issue), discuss some design considerations (e.g., requiring every engine to implement each capability vs broadcasting which of the defined capabilities are implemented by a given WES/engine; do we still allow non-standardized, WES/engine-specific params and, if so, how to minimize/avoid the risk of namespace conflicts) and then go ahead with specifying actual capabilities.