wdl
wdl copied to clipboard
clarify how inputs with defaults are implicitly optional
- prior discussion: #462
- implementation: https://github.com/chanzuckerberg/miniwdl/pull/507 and Cromwell does this already (? to confirm)
This creates a loophole in the type system where None
values are acceptable for an input with a non-optional declared type, as long as it has a default initializer. It's a bit awkward and creates a rather subtle behavioral distinction between Int x = 42
and Int? x = 42
, which the proposed table tries to explicate. However, this seems to be the lesser of two evils compared to not having the loophole; specifically when caller wants to pass through their own optional input but let the callee's default apply if it's absent.
As a community member, I'm happy with this interpretation, and I think you've done a great job to clarify the subtlety between Optional with default supplied and non-optional with default , especially with the table.
@illusional thanks! I'm happy to get this clarified too. I'd love to hear from somebody who has poked this specific behavior in Cromwell to see if it's captured correctly here, ie no reason to codify something that requires changes in all engines. cc @cjllanwarne @aednichols @pshapiro4broad or anyone really
I'm not wild about this because it creates a special case that isn't covered by the type rules. To be specific, I'm referring to the case where you declare a non-optional input but WDL would allow you pass an optional typed expression to it.
So what are the alternatives? Making the caller copy defaults up is bad. I can see two possibilities
- provide a way for a caller to say "pass this value on if
defined()
; if notdefined()
act as if it wasn't passed". This can be done within the current type rules but might need new syntax. - require tasks that wish to default this way (interpret
None
as use the default) to manually check fordefined()
and provide a default. Tasks would declare their inputs as optional and callers would need to know that passingNone
would be replaced with the default value.
I don't speak for the cromwell team though, I'm a WDL user not a cromwell developer.
@mlin I like the table!
I think it also helps to provide the explicit reminder that Int? x
is equivalent to Int? x = None
, so it doesn't matter whether the input to the call is being passed as a literal or an identifier. I.e. WDL is different than CWL in that None
always means None
, whether it was explicitly assigned by the user in an initializer, specified at runtime, or assigned implicitly as the value of an optional declaration without an initializer.
If I understand correctly, the only ambiguity we are discussing here is the case where we are calling a task with a non-optional input and passing a None
value, i.e. where task input Int x = 1
and call input x = None
in your table.
Personally, I do not think this is ambiguous with respect to the rest of the spec. None
cannot be assigned to a non-optional declaration, so passing x = None
where x
is defined as non-optional is an error. It may be that explicit mention of this in the spec is warranted, especially for users who may be coming to WDL from CWL.
But if we are instead proposing to add a special case here (thus breaking the type system even further than it already is), I am against that. I don't think the convenience afforded by this special case justifies the additional cognitive load for the user, nor the additional burden to the runtime implementer.
There are already mechanisms to call a task with a non-optional input using an optional value, and I think they are sufficient:
- Make
x
optional - Pass a default value if the workflow input is
None
, i.e.workflow foo { input { Int? y } call foo { input: x = select_first([y, 0]) } ...
@jdidion the pickle we're in (see the linked prior discussions) is that this
call foo { input: x = select_first([x, 0]) }
isn't a good general solution because it requires the caller to specify the default value, which shouldn't be its concern. Changing the default value would necessitate changing all the calls.
We get to this compromise where we don't want the workflow to have to know the default value, but we do want to allow pass-through of an overriding workflow-level input:
workflow w {
input {
String? s_override
}
call t { input: s = s_override }
}
task t {
input {
String s = "some default"
}
...
}
If we don't permit this then we need one of the features @pshapiro4broad suggested above. My view is that the loophole proposed here is the least bad, but also that reasonable people can disagree. If a tie-breaker is needed, some weight should be given to evidence that Cromwell permits this and workflows in the wild appear to use it (https://github.com/broadinstitute/gatk-sv/pull/154; but I haven't run that to completely verify expectations)
I do understand the desire for a work-around. I don't think it's very often the case that a default value changes, or that one person is using another person's "library" of WDL and wants to rely on them to define the default, but when it does happen it's annoying.
In my opinion, the better option would be to leverage the special syntax we already created for implicit binding (https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#call-statement) and put the special case there. For example,
workflow foo {
input {
Int? x
}
call bar { input: x }
...
}
task bar {
input {
Int x = 1
}
...
If x
is undefined, then we treat it as missing and don't pass it to the task at all. I know that is not currently how the behavior is described in the spec, but we could add that behavior in development
and allow the "loophole" you propose as deprecated behavior in 1.x. I still don't like it, but would be more willing to consider it.
I'd def rather find a better future approach, but, I personally just haven't seen the others discussed so far as better -- each one is a compromise. The abbreviated passthrough syntax only applies where the workflow input has exactly the same name.
I don't think it's very often the case that a default value changes, or that one person is using another person's "library" of WDL and wants to rely on them to define the default
I have a hard time seeing this POV, may have to agree to disagree on this specifically..
The abbreviated passthrough syntax only applies where the workflow input has exactly the same name.
Right, that should address the primary use case, which is when my task has a million options and I want to write a workflow that calls it and give the user the ability to override some of those options but let the task define the default values.
If there are other compelling use cases we could consider new syntax. Maybe something like call foo { input: x ?= y }
.
The way we define default
values and their use has always felt a little... odd to me. Much of this behaviour is likely inspired by what python does, however this does not naturally track in a typed system.
In python, the default is only applied when the parameter is ommitted. The following example illustrates this.
def foo(bar="biz"):
print(bar)
foo("brat")
=> "brat"
foo()
=> "biz"
foo(bar=None):
=> ""
However in our current system, the default value is applied when the parameter is ommitted OR when it is explicitly set to None
. This feels like an edge case that is waiting to bite someone in the but. To me, this feels analogous to the null
safety issues in java which Stronger typing on kotlin
fixed. These edge cases are complex to explain and will likely create more confusion in the future then not.
I think we also need to ask ourselves what the user is trying to achieve. Why is the default set so low in the task as opposed to higher up in the workflow? Is this an issue of education and poor practices? Ie a more natural way to write this would be something like the following.
workflow foo {
input {
Int x = 1
}
call bar { input: x }
...
}
task bar {
input {
Int x
}
}
So, Where does this leave us? For me, I would expect that the implicit option should lead to a type error. IMO A consistent type system will give back to the users of WDL
far more then edge cases like the above. I would be more then happy to issue a notice in the next minor release stating that this behaviour is poorly defined and will be removed in the future.
In place its place, we encourage default input values in the task
as opposed to the workflow. however, If a default value is set in a task, ANY assignment to that task input will override the default. We also do not allow the implicit "Optional" cast for values with default and any attempt to assign an optional
to a non optional value with a default, should result in a type error. We can provide a new function to act as a supplier if we want that is a little more elegent then select_first
... or just override select first
workflow foo {
input {
Int x?
}
# new engine function
call bar { input: default(x,1) }
# use select first, with two args, where the second is a default
call bar as bar2 { input: select_first(x,1) }
...
}
task bar {
input {
Int x = 1
}
}
@patmagee I'm definitely sympathetic to your POV. But I think what @mlin is getting at is that it's a commonly used pattern to have a common task that is called by other workflows, and the task developer may know better than the workflow developer what the default should be for a given parameter.
Let's consider the task of BWA MEM, and the specific case of the minimum seed length (-k
) parameter. As the developer of the BWA MEM task, I am going to give careful consideration to what the default value of -k
should be, maybe do some benchmarks on a variety of inputs. Now, as a developer of a workflow that uses the BWA MEM task, I may know that in my specific use case I want to set a value for -k
. On the other hand, I may want to trust that the developer of the task has determined the correct default and trust that unless the user decides to override it. Now multiply this by the dozens of parameters that BWA MEM supports.
I think it makes sense to provide some way for a workflow developer to say "either override parameter x with the value specified by the user or use the default specified by the task." Furthermore, I think we'd all prefer to do this in a way that doesn't add yet another special case to the type system.
Since apparently there is ambiguity in the 1.x specification, and different runtimes currently implement this differently (Cromwell allows passing None
to indicate the task's default value should be used, wdlTools throws a type exception, and miniwdl allows either depending on a command line option), we need to decide which way it should be interpreted in 1.x and, independently, how we want to handle it in 2.0 and later. I am fine with taking the more permissive interpretation in 1.x and marking it as deprecated. As for the future, I personally like your first solution (call bar { input: x }
), but I also get Mike's point that this doesn't solve all the cases, such as when the workflow and task inputs are named differently, or when you want to first compute an intermediate value and then pass that to the task, e.g.
workflow foo {
input {
Int? x
}
Int? y = if (defined(x)) select_first([x]) * 2 else None
call bar { input: x = y }
...
}
task bar {
input {
Int x
}
}
I don't think a function call works here because a function is really just a transformation from an input value of one type to an output value of a potentially different type, and there is no valid transformation from None
to a non-optional value. So I think we need syntax here to indicate a deviation from the normal rules of assignment. I proposed adding an ?=
operator (e.g. call bar { input: x ?= y }
), whose semantics would be "assign the right-hand-side to the left-hand-side only if the right-hand-side is not None
". I'm open to other suggestions.
Actually, considering further, if we make the semantics of ?=
be "evaluate the right-hand-side and, if it results in an error or a value of None
, do not assign the result to the left-hand-side and instead leave the left-hand-side with its default value", then the above workflow could be written as:
workflow foo {
input {
Int? x
}
Int? y ?= x * 2
call bar { input: x ?= y }
...
}
task bar {
input {
Int x
}
}
Update: I just spent a little bit of time poking at this with Cromwell 63 and couldn't find a case where it distinguishes T x = default
and T? x = default
; it accepts None
but uses the default in either case, and consequently, there seems to be no way for a caller to explicitly set T? x = default
to undefined/None. (Which admittedly is an obscure thing to do, but, otherwise begs whether putting the ?
quantifier there has any effect at all)
Coincidentally, I just had this issue pop up "in real life," so I'm sensing an increasingly urgent need for clarification in some way :sweat_smile:
Wow - talk about the worst of both worlds - allowing None
for a non-optional value and ignoring setting the value to None
for an optional value.
To resolve this issue for 1.x, I would vote in favor of your proposed changes if they were instead added to the errata for 1.1 (and we can put them in the main text for 1.1.1) and marked as deprecated. For development/2.0 I think we should come up with a better solution, but we can take that up in a separate discussion.
Just dug up these interesting historical artifacts:
https://github.com/biowdl/tasks/commit/d993500b08a8cf26fa5c516392d47b99425107be#diff-bb2a20c006384ad4bdd7f90555382acfb7e2be64a99e1d06b6e625ecb509c899R16 https://github.com/broadinstitute/cromwell/issues/3927 https://github.com/broadinstitute/cromwell/issues/3819
So would value input on this topic from @DavyCats @rhpvorderman @aednichols
Let's consider the task of BWA MEM, and the specific case of the minimum seed length (-k) parameter. As the developer of the BWA MEM task, I am going to give careful consideration to what the default value of -k should be, maybe do some benchmarks on a variety of inputs. Now, as a developer of a workflow that uses the BWA MEM task, I may know that in my specific use case I want to set a value for -k. On the other hand, I may want to trust that the developer of the task has determined the correct default and trust that unless the user decides to override it. Now multiply this by the dozens of parameters that BWA MEM supports.
As a developer of the BWA MEM task. In this case I would have the default value of k
at None
. Because the tool BWA knows best. So not specifying it will simply select the default from the tool. In fact, I see now that we don't specify it at all.
One default that we do override is the compression level. Most bioinformatic tools have this at level 5 (htslib) or 6 (gzip), which is plain crazy. It means the tool is spending more time on compressing than on bioinformatic algorithms. So we set it to 1 everywhere.
As regards to the issue, Ansible provides an omit
keyword which will allow you to select the tool default. I think that is the most elegant solution to this particular problem. But I am on the user side, not the implementation side. I can imagine the implementation will bring a few headaches.
@rhpvorderman Are you suggesting to add some method to resort to a task's/workflow's default? Like a function which will cause the input to be omitted if the value given is None
.
call value \ input | Int x |
Int? x |
Int x = 1 |
Int? = 1 |
---|---|---|---|---|
omitted | error | None |
1 |
1 |
omit(None) |
error | None |
1 |
1 |
None |
error | None |
error | None |
2 |
2 |
2 |
2 |
2 |
omit(2) |
2 |
2 |
2 |
2 |
@rhpvorderman Are you suggesting to add some method to resort to a tasks/workflows default? Like a function which will cause the input to be omitted if the value given is
None
.
No a simple keyword omit
(or Omit
).
**call value \ input ** | Int x |
Int? x |
Int x = 1 |
Int? = 1 |
---|---|---|---|---|
omitted | error | None |
1 |
1 |
Omit |
error | None |
1 |
1 |
None |
error | None | error | None |
2 |
2 |
2 |
2 |
2 |
Making omitting explicit with a keyword so to say. Just as with None
, this is very valuable in conditional statements.
I'm open to considering the Omit
concept, but we'd need to work out some sharp edges. To take this example of the currently proposed loophole,
workflow w {
input {
String? s_override
}
call t { input: s = s_override }
}
task t {
input {
String s = "some default"
}
...
}
Would such a call then be supposed to write s = select_first([s_override,Omit])
? (a bit of a mouthful?)
Hmm. Or should that be implicit? s_override is not defined, so omitted by default.
miniwdl v1.2.0 implements the behavior described in this PR, however, for now miniwdl check
will also flag the ambiguity about how to treat T? x = default
input declarations.
(I'm prepared to defend making inputs with defaults implicitly optional -- as a minor relaxation that does what the user wants most of the time, and IMO is not worse than alternatives so far mentioned -- but applying the default when the declared type is optional, is a bridge too far for me currently :sweat_smile:)
Given how this could potentially be a contentious topic, I suggest we open this for voting even though it is less a spec change and more a clarification. This behaviour will probably align with what 90% of people expect, and completely contradict what the other 10% would want :sweat_smile:.
@mlin are you happy to transition this to voting?
@patmagee thanks for labeling this; coincidentally, I also JUST got burned by this ambiguity again "in the wild", so I'm :+1: of course; but I'll also drop a note in slack to get new eyes on this too, since quite some time has passed.
:+1: From me.
👍
bumping this PR once again. It is now currently open for voting
:+1:
@mlin please change target to wdl-1.2