cwl-v1.2 Clarify evaluation policy of Workflow level requirements

The CWL standard says

Requirements specified in a parent Workflow are inherited by step processes if they are valid for that step

Such sentence is a bit ambiguous, in that it does not specify if the evaluation is eager (at the Workflow level) or lazy (at the inner step level). For ResourceRequirements, cwlltool (and StreamFlow, too) adopts a lazy evaluation policy, so that expressions are evaluated at the CommandLineTool level. This can generate unexpected behaviours with unconnected inputs, as reported in #1330. The sentence should then be clarified, establishing the desired evaluation policy for this kind of requirements.

Apr 06 '22 20:04 GlassOfWhiskey

I think your are correct about cwltool doing the lazy evaluation, skipping the top level workflow - if present - https://github.com/common-workflow-language/cwltool/blob/2be19f3291f07653013b0a91662f518c33cf9fe4/cwltool/process.py#L973-L975

But I am not sure if there won't be cases where an implementation may prefer a lazy evaluation (e.g. maybe a case where an implementation wants to call a service to dynamically compute resources? like a batch scheduler?).

Perhaps the specification should say instead that it is up to the implementers to decide whether to do lazy or eager evaluation of resources, but suggest for implementers to clarify in their documentation what was decided. So in this case we could document in cwltool whether we want lazy or eager, and why.

For cwltool I think the issue linked could be fixed both by doing a eager evaluation, or by trying to evaluate the resource expressions lazily with a fallback to fetching inputs from parent (more complicated solution, and prone to errors/bugs I guess).

Jun 02 '22 02:06 kinow

I understand there are use cases for both of eager and lazy evaluations in the process requirements.

However, as a long term solution, I against to add "eager" and "lazy" evaluations to the spec because we already have the specification for similar use cases: WorkflowStep#requirements to pass the parent process requirements to a child.

Unfortunately the current spec does not support the use cases because:

it is not clear whether the inherited process requirements are evaluated in the parent context or the child context, and
the spec lacks the way to distinguish the parent input object and child input object in WorkflowStep#requirements.

How about extending the spec of WorkflowStep#requirements to fix this issue? For example, how about introducing the way to refer the parent input object and child input object as follows?

use inputs to refer the parent input object, and
use self to refer the child input object in WorkflowStep#requirements.

In this proposal, we only need eager evaluations and therefore it is easier to understand the spec and also easier to implement the platforms.

class: Workflow

inputs:
  ncores: int

steps:
  step1:
    run: tool.cwl
    in:
      mem:
        default: 1024 # 1GB
    requirements:
      ResourceRequirement:
        coresMin: $(inputs.ncores) # refer the parent input object with `inputs`
        ramMin: $(self.mem)        # refer the child input object with `self`
   ...

Note:

We can safely use self for this purpose because no existing process requirements in the current spec use self in their expressions.
There is another proposal to refer the child input object: use self.inputs. It makes easier to extend the spec to refer the child runtime information with self.runtime.cores but I am not sure there is a use case for that.

Jun 02 '22 04:06 tom-tan

@tetron proposes deferring this to CWL v1.3, and adding an upgrader to copy down any dynamic requirements/hints to each workflow step, for evaluation in that context.

For CWL v1.2.1, he proposes documenting the existing behavior.

We should see if there are existing conformance test that touch on the "old" way

Apr 17 '23 15:04 mr-c