Clarify evaluation policy of Workflow level requirements
The CWL standard says
Requirements specified in a parent Workflow are inherited by step processes if they are valid for that step
Such sentence is a bit ambiguous, in that it does not specify if the evaluation is eager (at the Workflow level) or lazy (at the inner step level). For ResourceRequirements, cwlltool (and StreamFlow, too) adopts a lazy evaluation policy, so that expressions are evaluated at the CommandLineTool level. This can generate unexpected behaviours with unconnected inputs, as reported in #1330. The sentence should then be clarified, establishing the desired evaluation policy for this kind of requirements.
I think your are correct about cwltool doing the lazy evaluation, skipping the top level workflow - if present - https://github.com/common-workflow-language/cwltool/blob/2be19f3291f07653013b0a91662f518c33cf9fe4/cwltool/process.py#L973-L975
But I am not sure if there won't be cases where an implementation may prefer a lazy evaluation (e.g. maybe a case where an implementation wants to call a service to dynamically compute resources? like a batch scheduler?).
Perhaps the specification should say instead that it is up to the implementers to decide whether to do lazy or eager evaluation of resources, but suggest for implementers to clarify in their documentation what was decided. So in this case we could document in cwltool whether we want lazy or eager, and why.
For cwltool I think the issue linked could be fixed both by doing a eager evaluation, or by trying to evaluate the resource expressions lazily with a fallback to fetching inputs from parent (more complicated solution, and prone to errors/bugs I guess).
I understand there are use cases for both of eager and lazy evaluations in the process requirements.
However, as a long term solution, I against to add "eager" and "lazy" evaluations to the spec because we already have the specification for similar use cases: WorkflowStep#requirements to pass the parent process requirements to a child.
Unfortunately the current spec does not support the use cases because:
- it is not clear whether the inherited process requirements are evaluated in the parent context or the child context, and
- the spec lacks the way to distinguish the parent input object and child input object in
WorkflowStep#requirements.
How about extending the spec of WorkflowStep#requirements to fix this issue?
For example, how about introducing the way to refer the parent input object and child input object as follows?
- use
inputsto refer the parent input object, and - use
selfto refer the child input object inWorkflowStep#requirements.
In this proposal, we only need eager evaluations and therefore it is easier to understand the spec and also easier to implement the platforms.
class: Workflow
inputs:
ncores: int
steps:
step1:
run: tool.cwl
in:
mem:
default: 1024 # 1GB
requirements:
ResourceRequirement:
coresMin: $(inputs.ncores) # refer the parent input object with `inputs`
ramMin: $(self.mem) # refer the child input object with `self`
...
Note:
- We can safely use
selffor this purpose because no existing process requirements in the current spec useselfin their expressions. - There is another proposal to refer the child input object: use
self.inputs. It makes easier to extend the spec to refer the child runtime information withself.runtime.coresbut I am not sure there is a use case for that.
@tetron proposes deferring this to CWL v1.3, and adding an upgrader to copy down any dynamic requirements/hints to each workflow step, for evaluation in that context.
For CWL v1.2.1, he proposes documenting the existing behavior.
We should see if there are existing conformance test that touch on the "old" way