wdl
wdl copied to clipboard
Support Enumerations
@danbills commented on Thu Jul 06 2017
It would be nice to restrict types to specific values so that users can only specify one of a limited number of a values.
Per this forum post
@cjllanwarne commented on Mon Jul 10 2017
Could be a nice bring-along with https://github.com/broadinstitute/cromwell/issues/2283
@patmagee commented on Tue Aug 29 2017
Just wondering if there has been any discussion regarding this.
@katevoss commented on Tue Aug 29 2017
@patmagee not to my knowledge but I can start the ball rolling. @geoffjentry & @mcovarr, any opinions on supporting enums?
@geoffjentry commented on Tue Aug 29 2017
I'm leery but not necessarily against it. I'm also generally the one with the most conservative opinion in terms of adding WDL syntax, so view that take as a lower bound of acceptance :)
@patmagee what were you thinking in terms of the syntax?
Pinging @vdauwera so she's abreast of this convo.
@vdauwera commented on Wed Aug 30 2017
FWIW Enum support would definitely be very valuable to us in GATK-world, and are likely to be useful in general.
My one caveat would be that they should be easier to work with than they were in Queue (friendly elbow poke at @kshakir).
@patmagee commented on Fri Sep 01 2017
@geoffjentry I share your concern with adding any sort of syntax to the spec, and my first inclination is to fake support enumeration. Im not exactly sure how to do that though, nor am I sure of whether its possible.
The cleanest way I can think of implementing would be maybe something like a Java style enum:
enum MyEnum {
"A","B","C"
}
workflow wf {
MyEnum thisIsMyEnum
}
Another way that we would be able to do it would be define an Enum type in a workflow like so:
workflow wf {
#This would get overridden at run time, but the value would need to be validated
Enum greeting = ["HELLO","GOODBYE"]
or
#Done override anything but validate it
Enum["HELLO","GOODBYE"] greeting
}
@katevoss commented on Thu Sep 28 2017
@geoffjentry sounds like this is more of a WDL feature request, shall I move it to the WDL repo?
@geoffjentry commented on Thu Sep 28 2017
@geoffjentry any thoughts on this?
Bump
I'd like to propose an alternative syntax. There would be a single top-level enum
section where all enums are defined:
enum {
Int Foo = [1, 2, 3]
String Status = ["Error", "Pending", "Success"]
}
An open question is how to handle importing of enums from one file to another. I'm guessing it would work like structs.
@jdidion this is an interesting suggestion. It would certainly cut down on the boilerplate, and limit how many new structures we add to the language.
I am starting to wonder if we are starting to conflate two similar but distinct concepts:
- enumerated types
- input validation.
An enum type should not necessarily have an actual wdl type assigned to its inner values. The values are simple enumerations, and an identifier can be used for them. In this case, the total set of values are clearly written out as easy to use identifier (like in most programming languages) and values like
enum Animal { CAT, DOG, MANITEE }
workflow {
Input {
Animal animal = Animal.MANITEE
}
}
I think by saying that an enums value must be an identifier we greatly simplify the implementation of this and have an actual enum.
In the example you provided, I think you are proposing more so input validation (which could equally provide as much value). In this case, there is no first class Enum type, just a check at runtime that the value passed to a task or workflow is valid according to a simple set of rules (here it must be one of the values supplied). Because of this reassignment of the declaration completely looses its enum context
Workflow foo {
enum {
# here I know the allowable set of values
Int my_val = [1, 2, 3,4]
}
# reassignment and I no longer know that this declaration is restricted.
Int my_val_2 = my_val
}
Since the "enum" quality of this declaration is only enforced within the enum block and not upon reassignment (it becomes just a normal int), this doesn't really feel like an enum to me.
We could alternatively think of adding a syntax for restricting the values or adding valdiation directly to a declaration
workflow foo {
input {
Int val {
min: 100
max: 20000
}
String s {
values: ["foo", "fun"]
}
}
You're right - you should be able to refer to an enum value by its identifier. But typed enums are useful for many purposes. So let me propose the following:
enum Animal[String] {
CAT = "meowser"
DOG = "pupper"
MANATEE = "seacow"
}
task pet {
input {
Animal animal = Animal.CAT
}
command {
echo "I'm petting my ~{animal}"
}
}
input.json
{
"pet.animal": "CAT"
}
@jdidion in the task, would the value meowser
be printed out or would CAT
be printed out? Also what if you want to define an enum WITHOUT an inner type and just have the identifier?
Yes, the former. To do the latter, you would do:
enum Animal[String] {
CAT = "CAT"
DOG = "DOG"
MANATEE = "MANATEE"
}
but I guess we could provide a simplified form that would be equivalent to the above:
enum Animal { CAT, DOG, MANATEE }
i.e. the default type would be String
and the default value would be the enum identifier.
@jdidion I wonder if we should start by introducing enum in the simplified syntax, and then see if there is the desire for the more complex (subtyped) value. It will be easy to make it more complex, but reigning in the complexity is a challenge
Normally I'd agree, but there are already obvious use-cases for at least primitive-typed enums for enumerating/limiting the options available for some command line flags.
fair point. It seems at this point, there is adequate support to move forward with a potential implemetnation? Considering I was one of the original people requesting this, I would be happy to see it progress. If I have time next week I can come up with an RFC / PR @jdidion
Sounds good! Feel free to tag me for review.
This might be where @jdidion was heading all along but I like his proposal as it makes a path towards more general ADTs easier if that ever was a desire
It probably goes without saying (but I'll say it anyway) that enums can't be named the same as any other data type (or any reserved word for that matter).
Can someone give me an example of what issue exists right now that would be solved by the addition of enum's in some form?
@vortexing a common use case is a task parameter that only allows a limited set of values. For example, let's say my tool has a parameter --output-format
that can have values sam
and bam
. Rather than my task having a String output_format
parameter, it could have a FileType output_format
parameter where FileType
is an enum
. This makes it more clear to the user what values are allowed.
OK. So right now, someone could specify String output_format = 'dam'
and the task would fail for some dam reason, rather than having the workflow fail earlier before the task was actually sent/run b/c 'dam' is not one of the FileType
's allowed? That seems like a computational win to me when running someone else's workflow. Just as long as how it gets done is aimed at newbies b/c the intent is to be more clear to users who don't already know the enum values for such a param, I'm game.
That's right - you'd have to explicitly validate the parameter value yourself rather than having the runtime do the validation for you.
I'm also interested in having support for enums in WDL. It would help with consistent error reporting, and generating UIs based on declared WDL inputs.
Agreed that enums would be very useful for input validation. It seems much cleaner and simpler to maintain than writing a validation task that parses the user's inputs.
Just as a suggestion for those who need an enum like construction, you could use a Map instead. The keys would be all the allowed inputs and the values could be the same or some translation (allows for some abstraction). If a user provides an incorrect input value it will cause an error, though maybe not as specific as a dedicated enum syntax would allow for.
eg.
workflow Test {
input {
#...
String strand
}
# with translation:
Map[String, String] htseqStrandOptions = {"FR": "yes", "RF": "reverse", "no": "no"}
# or just to check for allowed values:
Map[String, String] htseqStrandOptions = {"yes": "yes", "reverse": "reverse", "no": "no"}
#...
call htseqCount {
input:
#...
strand = htseqStrandOptions[strand]
}
#...
}
Now if strand
is defined as anything other than "FR", "RF" or "no" in the first case or "yes", "reverse" or "no" in the second case, the workflow will fail.
Just re-reading this thread and @DavyCats suggestion gave me an idea. AFAIK, this is valid WDL:
struct MyEnum {
String name,
String value
}
workflow test {
Object MyEnum = {
FOO: MyEnum {
name: "FOO",
value: "FOO"
},
BAR: MyEnum {
name: "BAR",
value: "BAR"
}
}
input {
MyEnum e
}
if (e == MyEnum.FOO) {
...
}
}
Using MyEnum
as a variable name should be fine since MyEnum
is a user-defined type and thus not a reserved word.
The above is not ideal since it requires the user to pass in a MyEnum
instance rather than just an identifier (since the MyEnum
declaration is private). And there is nothing stopping the user from creating a MyEnum
instance that is not equal to any of the values in the MyEnum
object. But I think it's pretty close to what we (or at least I) want.
So I think enum
then just becomes a shorthand way of defining a struct and declaring a value of the same name in the global scope. We'd want the struct type to be non-instantiable. And we'd want "FOO" to deserialize to MyEnum.Foo
when used in an input JSON (and vice-versa). The import/aliasing semantics would be the same for an enum as for a struct.
file1.wdl:
enum MyEnum {
FOO,
BAR
}
workflow {
input {
MyEnum e = MyEnum.FOO
}
String ename = e.name
if (e == MyEnum.FOO) { ... }
}
file2.wdl:
# MyEnum is copied into file2's global namespace
import "file1.wdl"
# MyEnum is copied into file2's global namespace with name MyEnum2
import "file1.wdl" as file1a alias MyEnum as MyEnum2
The fact that an enum's value is struct-like means that we have the option to later support generic enums, e.g.
enum MyIntEnum[Int] {
FOO: 1,
BAR: 2
}
then the struct definition looks like:
struct MyIntEnum {
String name,
Int value
}