wdl icon indicating copy to clipboard operation
wdl copied to clipboard

Proposal: add optional `default` parameter to `select_first`

Open jdidion opened this issue 3 years ago • 6 comments

The select_first function would be more useful if it were possible to use it without knowing in advance that there will be at least one non-None value in the list. The best way to do this is by adding an optional default parameter that is returned if there are no non-None values in the list. So the signature of select_first becomes:

X select_first(Array[X?]+, [X])

The title was originally "Proposal: new standard library function X? select_any(Array[X?])" with the following proposal:

Semantics are the same as select_first except that the result type is Optional and if the array is empty or does not contain any non-null elements, the result value is None rather than an error.

I am ambivalent about whether this returns the first non-null element (if any) vs a randomly selected element (as any might imply).

Draft implementation: https://github.com/openwdl/wdl/tree/508-select-first-default

jdidion avatar Jun 17 '22 21:06 jdidion

If I understood correctly this is equiivalent to select_all(X)[0] with a check if the result of select_all(X) is empty.

Given results as an array of optionals this is how to currently do it.

Array[String] defined_results = select_all(results)
String? first_result = if (length(defined_results) == 0) then None else defined_results[0]

And can be shortened too the following with your proposal.

String? first_result = select_any(result)

So this does not enable new functionality, but is a language shorthand to write down something easier. This saves one (long) line where a conditional assignment takes place. Given that this new function has to be implemented and increases the cognitive overhead of WDL (as any new function does) I feel the cost-benefit ratio to this proposal is more weighted towards the cost side. But that's just my 2 cents.

rhpvorderman avatar Jun 20 '22 04:06 rhpvorderman

I wouldn't strictly oppose to having a select_any function. However, I would strongly oppose to it returning a "randomly selected element", since that would be quite bad for reproducibility. If randomization were to be a desired feature, I would suggest adding some optional argument to turn this on, i.e. select_any([None, 1,2,3]) and select_any([None, 1,2,3], false) would return 1, select_any([None, 1,2,3], true) could return 1, 2 or 3.

I feel like introducing randomization would open a whole new can of worms, though, a few additional questions come to mind:

  1. Does the rng have to be the same between different execution engines?
  2. Should seeds be settable in some way (eg. for testing)?
  3. Does the rng need to be cryptographically secure?

DavyCats avatar Jun 20 '22 09:06 DavyCats

I think this would be a good idea, I too though have the same hesitation about selecting a random value vs a deterministic value (Ie the first non optional value it encounters

patmagee avatar Jul 21 '22 14:07 patmagee

Thinking about this some more. Isn't it better to equip select_first with an optional default value parameter? The behavior would be as follows:

  • select_first([None, None, None]) -> crash. Normal behavior of select_first not changed.
  • select_first([None, None, None], None) -> There is no non-None value in the array, so None is returned.
  • select_first([None, None, None], 5) -> Returns 5
  • select_first([None, 5, 7], None) -> Returns 5

I feel that this solves @jdidion's issue without requiring an extra keyword. Also the behaviour fits well with the select_first name.

In case a random selection is needed I think a select_random(Array) function is a better fit. In case a non-optional value needs to be fetched this can be used in conjuction with the already existing select_all function. i.e. select_random(select_all([None, 5, None, 6, 7])) -> will select randomly between 5, 6 or 7. I don't see a use case for random selection however. That should be a separate proposal with clear real-world use cases.

rhpvorderman avatar Jul 22 '22 07:07 rhpvorderman

I just fear the type-system. Pragmatically, select_first is the only way to turn X? -> X, although we shouldn't strictly consider the implementation - breaking this would break a lot of workflows.

I think we should generally tend to introducing new functions, rather than changing the implementation details. For this case specifically, I like the approach in C#, with .First, .Single, FirstOrDefault, SingleOrDefault (where single throws an exception for multiple non-null values).

Edit: I'd also vote for select_random as the function name, taking an extra variable for filtering empty values.

illusional avatar Jul 22 '22 07:07 illusional

@rhpvorderman I like your suggestion of adding a default parameter to select_first much better. It avoids adding a new function, and it avoids the ambiguity of my proposed select_any. I'll update the title and description of this issue.

jdidion avatar Jan 26 '23 18:01 jdidion