wdl
wdl copied to clipboard
Proposal: add optional `default` parameter to `select_first`
The select_first function would be more useful if it were possible to use it without knowing in advance that there will be at least one non-None value in the list. The best way to do this is by adding an optional default parameter that is returned if there are no non-None values in the list. So the signature of select_first becomes:
X select_first(Array[X?]+, [X])
The title was originally "Proposal: new standard library function X? select_any(Array[X?])" with the following proposal:
Semantics are the same as select_first except that the result type is Optional and if the array is empty or does not contain any non-null elements, the result value is None rather than an error.
I am ambivalent about whether this returns the first non-null element (if any) vs a randomly selected element (as any might imply).
Draft implementation: https://github.com/openwdl/wdl/tree/508-select-first-default
If I understood correctly this is equiivalent to select_all(X)[0] with a check if the result of select_all(X) is empty.
Given results as an array of optionals this is how to currently do it.
Array[String] defined_results = select_all(results)
String? first_result = if (length(defined_results) == 0) then None else defined_results[0]
And can be shortened too the following with your proposal.
String? first_result = select_any(result)
So this does not enable new functionality, but is a language shorthand to write down something easier. This saves one (long) line where a conditional assignment takes place. Given that this new function has to be implemented and increases the cognitive overhead of WDL (as any new function does) I feel the cost-benefit ratio to this proposal is more weighted towards the cost side. But that's just my 2 cents.
I wouldn't strictly oppose to having a select_any function. However, I would strongly oppose to it returning a "randomly selected element", since that would be quite bad for reproducibility. If randomization were to be a desired feature, I would suggest adding some optional argument to turn this on, i.e. select_any([None, 1,2,3]) and select_any([None, 1,2,3], false) would return 1, select_any([None, 1,2,3], true) could return 1, 2 or 3.
I feel like introducing randomization would open a whole new can of worms, though, a few additional questions come to mind:
- Does the rng have to be the same between different execution engines?
- Should seeds be settable in some way (eg. for testing)?
- Does the rng need to be cryptographically secure?
I think this would be a good idea, I too though have the same hesitation about selecting a random value vs a deterministic value (Ie the first non optional value it encounters
Thinking about this some more. Isn't it better to equip select_first with an optional default value parameter?
The behavior would be as follows:
select_first([None, None, None])-> crash. Normal behavior ofselect_firstnot changed.select_first([None, None, None], None)-> There is no non-None value in the array, so None is returned.select_first([None, None, None], 5)-> Returns5select_first([None, 5, 7], None)-> Returns 5
I feel that this solves @jdidion's issue without requiring an extra keyword. Also the behaviour fits well with the select_first name.
In case a random selection is needed I think a select_random(Array) function is a better fit. In case a non-optional value needs to be fetched this can be used in conjuction with the already existing select_all function. i.e. select_random(select_all([None, 5, None, 6, 7])) -> will select randomly between 5, 6 or 7. I don't see a use case for random selection however. That should be a separate proposal with clear real-world use cases.
I just fear the type-system. Pragmatically, select_first is the only way to turn X? -> X, although we shouldn't strictly consider the implementation - breaking this would break a lot of workflows.
I think we should generally tend to introducing new functions, rather than changing the implementation details. For this case specifically, I like the approach in C#, with .First, .Single, FirstOrDefault, SingleOrDefault (where single throws an exception for multiple non-null values).
Edit: I'd also vote for select_random as the function name, taking an extra variable for filtering empty values.
@rhpvorderman I like your suggestion of adding a default parameter to select_first much better. It avoids adding a new function, and it avoids the ambiguity of my proposed select_any. I'll update the title and description of this issue.