wdl icon indicating copy to clipboard operation
wdl copied to clipboard

clarify how inputs with defaults are implicitly optional

Open mlin opened this issue 3 years ago • 27 comments

  • prior discussion: #462
  • implementation: https://github.com/chanzuckerberg/miniwdl/pull/507 and Cromwell does this already (? to confirm)

This creates a loophole in the type system where None values are acceptable for an input with a non-optional declared type, as long as it has a default initializer. It's a bit awkward and creates a rather subtle behavioral distinction between Int x = 42 and Int? x = 42, which the proposed table tries to explicate. However, this seems to be the lesser of two evils compared to not having the loophole; specifically when caller wants to pass through their own optional input but let the callee's default apply if it's absent.

mlin avatar May 21 '21 07:05 mlin

As a community member, I'm happy with this interpretation, and I think you've done a great job to clarify the subtlety between Optional with default supplied and non-optional with default , especially with the table.

illusional avatar May 24 '21 00:05 illusional

@illusional thanks! I'm happy to get this clarified too. I'd love to hear from somebody who has poked this specific behavior in Cromwell to see if it's captured correctly here, ie no reason to codify something that requires changes in all engines. cc @cjllanwarne @aednichols @pshapiro4broad or anyone really

image

mlin avatar May 24 '21 07:05 mlin

I'm not wild about this because it creates a special case that isn't covered by the type rules. To be specific, I'm referring to the case where you declare a non-optional input but WDL would allow you pass an optional typed expression to it.

So what are the alternatives? Making the caller copy defaults up is bad. I can see two possibilities

  • provide a way for a caller to say "pass this value on if defined(); if not defined() act as if it wasn't passed". This can be done within the current type rules but might need new syntax.
  • require tasks that wish to default this way (interpret None as use the default) to manually check for defined() and provide a default. Tasks would declare their inputs as optional and callers would need to know that passing None would be replaced with the default value.

I don't speak for the cromwell team though, I'm a WDL user not a cromwell developer.

pshapiro4broad avatar May 24 '21 20:05 pshapiro4broad

@mlin I like the table!

I think it also helps to provide the explicit reminder that Int? x is equivalent to Int? x = None, so it doesn't matter whether the input to the call is being passed as a literal or an identifier. I.e. WDL is different than CWL in that None always means None, whether it was explicitly assigned by the user in an initializer, specified at runtime, or assigned implicitly as the value of an optional declaration without an initializer.

If I understand correctly, the only ambiguity we are discussing here is the case where we are calling a task with a non-optional input and passing a None value, i.e. where task input Int x = 1 and call input x = None in your table.

Personally, I do not think this is ambiguous with respect to the rest of the spec. None cannot be assigned to a non-optional declaration, so passing x = None where x is defined as non-optional is an error. It may be that explicit mention of this in the spec is warranted, especially for users who may be coming to WDL from CWL.

But if we are instead proposing to add a special case here (thus breaking the type system even further than it already is), I am against that. I don't think the convenience afforded by this special case justifies the additional cognitive load for the user, nor the additional burden to the runtime implementer.

There are already mechanisms to call a task with a non-optional input using an optional value, and I think they are sufficient:

  • Make x optional
  • Pass a default value if the workflow input is None, i.e.
    workflow foo {
      input {
        Int? y
      }
      call foo { input: x = select_first([y, 0]) }
      ...
    

jdidion avatar May 26 '21 23:05 jdidion

@jdidion the pickle we're in (see the linked prior discussions) is that this

call foo { input: x = select_first([x, 0]) }

isn't a good general solution because it requires the caller to specify the default value, which shouldn't be its concern. Changing the default value would necessitate changing all the calls.

We get to this compromise where we don't want the workflow to have to know the default value, but we do want to allow pass-through of an overriding workflow-level input:


workflow w {
  input {
    String? s_override
  }
  call t { input: s = s_override }
}

task t {
  input {
    String s = "some default"
  }
  ...
}

If we don't permit this then we need one of the features @pshapiro4broad suggested above. My view is that the loophole proposed here is the least bad, but also that reasonable people can disagree. If a tie-breaker is needed, some weight should be given to evidence that Cromwell permits this and workflows in the wild appear to use it (https://github.com/broadinstitute/gatk-sv/pull/154; but I haven't run that to completely verify expectations)

mlin avatar May 26 '21 23:05 mlin

I do understand the desire for a work-around. I don't think it's very often the case that a default value changes, or that one person is using another person's "library" of WDL and wants to rely on them to define the default, but when it does happen it's annoying.

In my opinion, the better option would be to leverage the special syntax we already created for implicit binding (https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#call-statement) and put the special case there. For example,

workflow foo {
  input {
    Int? x
  }
  call bar { input: x }
  ...
}

task bar {
  input {
    Int x = 1
  }
  ...

If x is undefined, then we treat it as missing and don't pass it to the task at all. I know that is not currently how the behavior is described in the spec, but we could add that behavior in development and allow the "loophole" you propose as deprecated behavior in 1.x. I still don't like it, but would be more willing to consider it.

jdidion avatar May 26 '21 23:05 jdidion

I'd def rather find a better future approach, but, I personally just haven't seen the others discussed so far as better -- each one is a compromise. The abbreviated passthrough syntax only applies where the workflow input has exactly the same name.

I don't think it's very often the case that a default value changes, or that one person is using another person's "library" of WDL and wants to rely on them to define the default

I have a hard time seeing this POV, may have to agree to disagree on this specifically..

mlin avatar May 27 '21 01:05 mlin

The abbreviated passthrough syntax only applies where the workflow input has exactly the same name.

Right, that should address the primary use case, which is when my task has a million options and I want to write a workflow that calls it and give the user the ability to override some of those options but let the task define the default values.

If there are other compelling use cases we could consider new syntax. Maybe something like call foo { input: x ?= y }.

jdidion avatar May 27 '21 01:05 jdidion

The way we define default values and their use has always felt a little... odd to me. Much of this behaviour is likely inspired by what python does, however this does not naturally track in a typed system.

In python, the default is only applied when the parameter is ommitted. The following example illustrates this.

def foo(bar="biz"):
  print(bar)

foo("brat")
=> "brat"
foo()
=> "biz"
foo(bar=None):
=> ""

However in our current system, the default value is applied when the parameter is ommitted OR when it is explicitly set to None. This feels like an edge case that is waiting to bite someone in the but. To me, this feels analogous to the null safety issues in java which Stronger typing on kotlin fixed. These edge cases are complex to explain and will likely create more confusion in the future then not.

I think we also need to ask ourselves what the user is trying to achieve. Why is the default set so low in the task as opposed to higher up in the workflow? Is this an issue of education and poor practices? Ie a more natural way to write this would be something like the following.

workflow foo {
  input {
    Int x = 1
  }
  call bar { input: x }
  ...
}

task bar {
  input {
    Int x
  }
}

So, Where does this leave us? For me, I would expect that the implicit option should lead to a type error. IMO A consistent type system will give back to the users of WDL far more then edge cases like the above. I would be more then happy to issue a notice in the next minor release stating that this behaviour is poorly defined and will be removed in the future.

In place its place, we encourage default input values in the task as opposed to the workflow. however, If a default value is set in a task, ANY assignment to that task input will override the default. We also do not allow the implicit "Optional" cast for values with default and any attempt to assign an optional to a non optional value with a default, should result in a type error. We can provide a new function to act as a supplier if we want that is a little more elegent then select_first... or just override select first

workflow foo {
  input {
    Int x?
  }

  # new engine function
  call bar { input: default(x,1) }

  # use select first, with two args, where the second is a default
  call bar as bar2 { input: select_first(x,1) }
  ...
}

task bar {
  input {
    Int x = 1
  }
}

patmagee avatar May 27 '21 19:05 patmagee

@patmagee I'm definitely sympathetic to your POV. But I think what @mlin is getting at is that it's a commonly used pattern to have a common task that is called by other workflows, and the task developer may know better than the workflow developer what the default should be for a given parameter.

Let's consider the task of BWA MEM, and the specific case of the minimum seed length (-k) parameter. As the developer of the BWA MEM task, I am going to give careful consideration to what the default value of -k should be, maybe do some benchmarks on a variety of inputs. Now, as a developer of a workflow that uses the BWA MEM task, I may know that in my specific use case I want to set a value for -k. On the other hand, I may want to trust that the developer of the task has determined the correct default and trust that unless the user decides to override it. Now multiply this by the dozens of parameters that BWA MEM supports.

I think it makes sense to provide some way for a workflow developer to say "either override parameter x with the value specified by the user or use the default specified by the task." Furthermore, I think we'd all prefer to do this in a way that doesn't add yet another special case to the type system.

Since apparently there is ambiguity in the 1.x specification, and different runtimes currently implement this differently (Cromwell allows passing None to indicate the task's default value should be used, wdlTools throws a type exception, and miniwdl allows either depending on a command line option), we need to decide which way it should be interpreted in 1.x and, independently, how we want to handle it in 2.0 and later. I am fine with taking the more permissive interpretation in 1.x and marking it as deprecated. As for the future, I personally like your first solution (call bar { input: x }), but I also get Mike's point that this doesn't solve all the cases, such as when the workflow and task inputs are named differently, or when you want to first compute an intermediate value and then pass that to the task, e.g.

workflow foo {
  input {
    Int? x
  }
  Int? y = if (defined(x)) select_first([x]) * 2 else None
  call bar { input: x = y }
  ...
}

task bar {
  input {
    Int x
  }
}

I don't think a function call works here because a function is really just a transformation from an input value of one type to an output value of a potentially different type, and there is no valid transformation from None to a non-optional value. So I think we need syntax here to indicate a deviation from the normal rules of assignment. I proposed adding an ?= operator (e.g. call bar { input: x ?= y }), whose semantics would be "assign the right-hand-side to the left-hand-side only if the right-hand-side is not None". I'm open to other suggestions.

jdidion avatar May 27 '21 20:05 jdidion

Actually, considering further, if we make the semantics of ?= be "evaluate the right-hand-side and, if it results in an error or a value of None, do not assign the result to the left-hand-side and instead leave the left-hand-side with its default value", then the above workflow could be written as:

workflow foo {
  input {
    Int? x
  }
  Int? y ?= x * 2
  call bar { input: x ?= y }
  ...
}

task bar {
  input {
    Int x
  }
}

jdidion avatar May 27 '21 20:05 jdidion

Update: I just spent a little bit of time poking at this with Cromwell 63 and couldn't find a case where it distinguishes T x = default and T? x = default; it accepts None but uses the default in either case, and consequently, there seems to be no way for a caller to explicitly set T? x = default to undefined/None. (Which admittedly is an obscure thing to do, but, otherwise begs whether putting the ? quantifier there has any effect at all)

Coincidentally, I just had this issue pop up "in real life," so I'm sensing an increasingly urgent need for clarification in some way :sweat_smile:

mlin avatar Jun 05 '21 22:06 mlin

Wow - talk about the worst of both worlds - allowing None for a non-optional value and ignoring setting the value to None for an optional value.

To resolve this issue for 1.x, I would vote in favor of your proposed changes if they were instead added to the errata for 1.1 (and we can put them in the main text for 1.1.1) and marked as deprecated. For development/2.0 I think we should come up with a better solution, but we can take that up in a separate discussion.

jdidion avatar Jun 07 '21 23:06 jdidion

Just dug up these interesting historical artifacts:

https://github.com/biowdl/tasks/commit/d993500b08a8cf26fa5c516392d47b99425107be#diff-bb2a20c006384ad4bdd7f90555382acfb7e2be64a99e1d06b6e625ecb509c899R16 https://github.com/broadinstitute/cromwell/issues/3927 https://github.com/broadinstitute/cromwell/issues/3819

So would value input on this topic from @DavyCats @rhpvorderman @aednichols

mlin avatar Jun 08 '21 03:06 mlin

Let's consider the task of BWA MEM, and the specific case of the minimum seed length (-k) parameter. As the developer of the BWA MEM task, I am going to give careful consideration to what the default value of -k should be, maybe do some benchmarks on a variety of inputs. Now, as a developer of a workflow that uses the BWA MEM task, I may know that in my specific use case I want to set a value for -k. On the other hand, I may want to trust that the developer of the task has determined the correct default and trust that unless the user decides to override it. Now multiply this by the dozens of parameters that BWA MEM supports.

As a developer of the BWA MEM task. In this case I would have the default value of k at None. Because the tool BWA knows best. So not specifying it will simply select the default from the tool. In fact, I see now that we don't specify it at all.

One default that we do override is the compression level. Most bioinformatic tools have this at level 5 (htslib) or 6 (gzip), which is plain crazy. It means the tool is spending more time on compressing than on bioinformatic algorithms. So we set it to 1 everywhere.

As regards to the issue, Ansible provides an omit keyword which will allow you to select the tool default. I think that is the most elegant solution to this particular problem. But I am on the user side, not the implementation side. I can imagine the implementation will bring a few headaches.

rhpvorderman avatar Jun 08 '21 06:06 rhpvorderman

@rhpvorderman Are you suggesting to add some method to resort to a task's/workflow's default? Like a function which will cause the input to be omitted if the value given is None.

call value \ input Int x Int? x Int x = 1 Int? = 1
omitted error None 1 1
omit(None) error None 1 1
None error None error None
2 2 2 2 2
omit(2) 2 2 2 2

DavyCats avatar Jun 08 '21 09:06 DavyCats

@rhpvorderman Are you suggesting to add some method to resort to a tasks/workflows default? Like a function which will cause the input to be omitted if the value given is None.

No a simple keyword omit (or Omit).

**call value \ input ** Int x Int? x Int x = 1 Int? = 1
omitted error None 1 1
Omit error None 1 1
None error None error None
2 2 2 2 2

Making omitting explicit with a keyword so to say. Just as with None, this is very valuable in conditional statements.

rhpvorderman avatar Jun 08 '21 09:06 rhpvorderman

I'm open to considering the Omit concept, but we'd need to work out some sharp edges. To take this example of the currently proposed loophole,

workflow w {
  input {
    String? s_override
  }
  call t { input: s = s_override }
}

task t {
  input {
    String s = "some default"
  }
  ...
}

Would such a call then be supposed to write s = select_first([s_override,Omit])? (a bit of a mouthful?)

mlin avatar Jun 08 '21 10:06 mlin

Hmm. Or should that be implicit? s_override is not defined, so omitted by default.

rhpvorderman avatar Jun 08 '21 10:06 rhpvorderman

miniwdl v1.2.0 implements the behavior described in this PR, however, for now miniwdl check will also flag the ambiguity about how to treat T? x = default input declarations.

(I'm prepared to defend making inputs with defaults implicitly optional -- as a minor relaxation that does what the user wants most of the time, and IMO is not worse than alternatives so far mentioned -- but applying the default when the declared type is optional, is a bridge too far for me currently :sweat_smile:)

mlin avatar Jun 08 '21 19:06 mlin

Given how this could potentially be a contentious topic, I suggest we open this for voting even though it is less a spec change and more a clarification. This behaviour will probably align with what 90% of people expect, and completely contradict what the other 10% would want :sweat_smile:.

@mlin are you happy to transition this to voting?

patmagee avatar Jul 20 '21 13:07 patmagee

@patmagee thanks for labeling this; coincidentally, I also JUST got burned by this ambiguity again "in the wild", so I'm :+1: of course; but I'll also drop a note in slack to get new eyes on this too, since quite some time has passed.

mlin avatar Mar 08 '22 03:03 mlin

:+1: From me.

rhpvorderman avatar Mar 08 '22 06:03 rhpvorderman

👍

patmagee avatar Mar 10 '22 13:03 patmagee

bumping this PR once again. It is now currently open for voting

patmagee avatar Feb 13 '23 19:02 patmagee

:+1:

DavyCats avatar Feb 14 '23 09:02 DavyCats

@mlin please change target to wdl-1.2

jdidion avatar Dec 17 '23 23:12 jdidion