wdl icon indicating copy to clipboard operation
wdl copied to clipboard

Support Enumerations

Open katevoss opened this issue 7 years ago • 21 comments

@danbills commented on Thu Jul 06 2017

It would be nice to restrict types to specific values so that users can only specify one of a limited number of a values.

Per this forum post


@cjllanwarne commented on Mon Jul 10 2017

Could be a nice bring-along with https://github.com/broadinstitute/cromwell/issues/2283


@patmagee commented on Tue Aug 29 2017

Just wondering if there has been any discussion regarding this.


@katevoss commented on Tue Aug 29 2017

@patmagee not to my knowledge but I can start the ball rolling. @geoffjentry & @mcovarr, any opinions on supporting enums?


@geoffjentry commented on Tue Aug 29 2017

I'm leery but not necessarily against it. I'm also generally the one with the most conservative opinion in terms of adding WDL syntax, so view that take as a lower bound of acceptance :)

@patmagee what were you thinking in terms of the syntax?

Pinging @vdauwera so she's abreast of this convo.


@vdauwera commented on Wed Aug 30 2017

FWIW Enum support would definitely be very valuable to us in GATK-world, and are likely to be useful in general.

My one caveat would be that they should be easier to work with than they were in Queue (friendly elbow poke at @kshakir).


@patmagee commented on Fri Sep 01 2017

@geoffjentry I share your concern with adding any sort of syntax to the spec, and my first inclination is to fake support enumeration. Im not exactly sure how to do that though, nor am I sure of whether its possible.

The cleanest way I can think of implementing would be maybe something like a Java style enum:

enum MyEnum {
     "A","B","C"
}

workflow wf {
    MyEnum thisIsMyEnum
}

Another way that we would be able to do it would be define an Enum type in a workflow like so:

workflow wf {
     #This would get overridden at run time, but the value would need to be validated
     Enum greeting = ["HELLO","GOODBYE"]
     or
     #Done override anything but validate it
     Enum["HELLO","GOODBYE"] greeting
     
}

@katevoss commented on Thu Sep 28 2017

@geoffjentry sounds like this is more of a WDL feature request, shall I move it to the WDL repo?


@geoffjentry commented on Thu Sep 28 2017

628b747f8ccdfb757062f36a27eedecfc2295f515c0586e05fbfb0620c0571a2

katevoss avatar Sep 28 '17 16:09 katevoss

@geoffjentry any thoughts on this?

patmagee avatar Sep 28 '17 17:09 patmagee

Bump

SHuang-Broad avatar Sep 23 '19 01:09 SHuang-Broad

I'd like to propose an alternative syntax. There would be a single top-level enum section where all enums are defined:

enum {
  Int Foo = [1, 2, 3]
  String Status = ["Error", "Pending", "Success"]
}

An open question is how to handle importing of enums from one file to another. I'm guessing it would work like structs.

jdidion avatar Aug 10 '21 22:08 jdidion

@jdidion this is an interesting suggestion. It would certainly cut down on the boilerplate, and limit how many new structures we add to the language.

I am starting to wonder if we are starting to conflate two similar but distinct concepts:

  • enumerated types
  • input validation.

An enum type should not necessarily have an actual wdl type assigned to its inner values. The values are simple enumerations, and an identifier can be used for them. In this case, the total set of values are clearly written out as easy to use identifier (like in most programming languages) and values like

enum Animal { CAT, DOG, MANITEE }

workflow {
  Input {
    Animal animal = Animal.MANITEE
  }
}

I think by saying that an enums value must be an identifier we greatly simplify the implementation of this and have an actual enum.

In the example you provided, I think you are proposing more so input validation (which could equally provide as much value). In this case, there is no first class Enum type, just a check at runtime that the value passed to a task or workflow is valid according to a simple set of rules (here it must be one of the values supplied). Because of this reassignment of the declaration completely looses its enum context

Workflow foo {
  enum {
     # here I know the allowable set of values
     Int my_val = [1, 2, 3,4]
  }
  # reassignment and I no longer know that this declaration is restricted. 
  Int my_val_2 = my_val
}

Since the "enum" quality of this declaration is only enforced within the enum block and not upon reassignment (it becomes just a normal int), this doesn't really feel like an enum to me.

We could alternatively think of adding a syntax for restricting the values or adding valdiation directly to a declaration


workflow foo {
   input {
     Int val {
       min: 100
       max: 20000
     }
     String s {
       values: ["foo", "fun"]
     }
}

patmagee avatar Aug 11 '21 03:08 patmagee

You're right - you should be able to refer to an enum value by its identifier. But typed enums are useful for many purposes. So let me propose the following:

enum Animal[String] {
  CAT = "meowser"
  DOG = "pupper"
  MANATEE = "seacow"
}

task pet {
  input {
    Animal animal = Animal.CAT
  }
  command {
    echo "I'm petting my ~{animal}"
  }
}

input.json

{
  "pet.animal": "CAT"
}

jdidion avatar Aug 11 '21 17:08 jdidion

@jdidion in the task, would the value meowser be printed out or would CAT be printed out? Also what if you want to define an enum WITHOUT an inner type and just have the identifier?

patmagee avatar Aug 11 '21 18:08 patmagee

Yes, the former. To do the latter, you would do:

enum Animal[String] {
  CAT = "CAT"
  DOG = "DOG"
  MANATEE = "MANATEE"
}

but I guess we could provide a simplified form that would be equivalent to the above:

enum Animal { CAT, DOG, MANATEE }

i.e. the default type would be String and the default value would be the enum identifier.

jdidion avatar Aug 11 '21 18:08 jdidion

@jdidion I wonder if we should start by introducing enum in the simplified syntax, and then see if there is the desire for the more complex (subtyped) value. It will be easy to make it more complex, but reigning in the complexity is a challenge

patmagee avatar Aug 12 '21 14:08 patmagee

Normally I'd agree, but there are already obvious use-cases for at least primitive-typed enums for enumerating/limiting the options available for some command line flags.

jdidion avatar Aug 12 '21 16:08 jdidion

fair point. It seems at this point, there is adequate support to move forward with a potential implemetnation? Considering I was one of the original people requesting this, I would be happy to see it progress. If I have time next week I can come up with an RFC / PR @jdidion

patmagee avatar Aug 13 '21 14:08 patmagee

Sounds good! Feel free to tag me for review.

jdidion avatar Aug 13 '21 23:08 jdidion

This might be where @jdidion was heading all along but I like his proposal as it makes a path towards more general ADTs easier if that ever was a desire

geoffjentry avatar Aug 13 '21 23:08 geoffjentry

It probably goes without saying (but I'll say it anyway) that enums can't be named the same as any other data type (or any reserved word for that matter).

jdidion avatar Aug 13 '21 23:08 jdidion

Can someone give me an example of what issue exists right now that would be solved by the addition of enum's in some form?

vortexing avatar Jan 03 '22 18:01 vortexing

@vortexing a common use case is a task parameter that only allows a limited set of values. For example, let's say my tool has a parameter --output-format that can have values sam and bam. Rather than my task having a String output_format parameter, it could have a FileType output_format parameter where FileType is an enum. This makes it more clear to the user what values are allowed.

jdidion avatar Jan 03 '22 18:01 jdidion

OK. So right now, someone could specify String output_format = 'dam' and the task would fail for some dam reason, rather than having the workflow fail earlier before the task was actually sent/run b/c 'dam' is not one of the FileType's allowed? That seems like a computational win to me when running someone else's workflow. Just as long as how it gets done is aimed at newbies b/c the intent is to be more clear to users who don't already know the enum values for such a param, I'm game.

vortexing avatar Jan 03 '22 19:01 vortexing

That's right - you'd have to explicitly validate the parameter value yourself rather than having the runtime do the validation for you.

jdidion avatar Jan 03 '22 19:01 jdidion

I'm also interested in having support for enums in WDL. It would help with consistent error reporting, and generating UIs based on declared WDL inputs.

jvlasblom avatar May 04 '22 16:05 jvlasblom

Agreed that enums would be very useful for input validation. It seems much cleaner and simpler to maintain than writing a validation task that parses the user's inputs.

hkeward avatar Dec 12 '22 17:12 hkeward

Just as a suggestion for those who need an enum like construction, you could use a Map instead. The keys would be all the allowed inputs and the values could be the same or some translation (allows for some abstraction). If a user provides an incorrect input value it will cause an error, though maybe not as specific as a dedicated enum syntax would allow for.

eg.

workflow Test {
  input {
    #...
    String strand
  }
  # with translation:
  Map[String, String] htseqStrandOptions = {"FR": "yes", "RF": "reverse", "no": "no"}
  # or just to check for allowed values:
  Map[String, String] htseqStrandOptions = {"yes": "yes", "reverse": "reverse", "no": "no"}
  #...

  call htseqCount {
    input:
      #...
      strand = htseqStrandOptions[strand]
  }
  #...
}

Now if strand is defined as anything other than "FR", "RF" or "no" in the first case or "yes", "reverse" or "no" in the second case, the workflow will fail.

DavyCats avatar Dec 13 '22 11:12 DavyCats

Just re-reading this thread and @DavyCats suggestion gave me an idea. AFAIK, this is valid WDL:

struct MyEnum {
  String name,
  String value
}

workflow test {
  Object MyEnum = {
    FOO: MyEnum {
      name: "FOO",
      value: "FOO"
    },
    BAR: MyEnum {
      name: "BAR",
      value: "BAR"
    }
  }

  input {
    MyEnum e
  }

  if (e == MyEnum.FOO) {
    ...
  }
}

Using MyEnum as a variable name should be fine since MyEnum is a user-defined type and thus not a reserved word.

The above is not ideal since it requires the user to pass in a MyEnum instance rather than just an identifier (since the MyEnum declaration is private). And there is nothing stopping the user from creating a MyEnum instance that is not equal to any of the values in the MyEnum object. But I think it's pretty close to what we (or at least I) want.

So I think enum then just becomes a shorthand way of defining a struct and declaring a value of the same name in the global scope. We'd want the struct type to be non-instantiable. And we'd want "FOO" to deserialize to MyEnum.Foo when used in an input JSON (and vice-versa). The import/aliasing semantics would be the same for an enum as for a struct.

file1.wdl:

enum MyEnum {
  FOO,
  BAR
}

workflow {
  input {
    MyEnum e = MyEnum.FOO
  }

  String ename = e.name

  if (e == MyEnum.FOO) { ... }
}

file2.wdl:

# MyEnum is copied into file2's global namespace
import "file1.wdl"

# MyEnum is copied into file2's global namespace with name MyEnum2
import "file1.wdl" as file1a alias MyEnum as MyEnum2

The fact that an enum's value is struct-like means that we have the option to later support generic enums, e.g.

enum MyIntEnum[Int] {
  FOO: 1,
  BAR: 2
}

then the struct definition looks like:

struct MyIntEnum {
  String name,
  Int value
}

jdidion avatar Apr 05 '23 22:04 jdidion