wdl icon indicating copy to clipboard operation
wdl copied to clipboard

Proposal: user-defined functions

Open jdidion opened this issue 3 years ago • 2 comments

Cross-posted from #405

Also see discussions here:

  • https://github.com/openwdl/wdl/discussions/413
  • https://github.com/openwdl/wdl/discussions/407

Currently, the WDL specification provides a small library of functions that meet the needs of many use-cases, but certainly not all of them. The ability to define new functions has been requested several times in the past. This proposal aims for an idiomatic specification of UDFs.

Example:

task call_funcs {
  input {
    File infile
  }

  command <<<
  process_yaml < ~{read_yaml(infile)} > output.txt
  >>>

  output {
    File outfile = "output.txt"
  }
}

func read_yaml {
  input {
    File infile
    String? encoding
  }

  command <<<
  import yaml
  with open("~{infile}", "r", encoding="~{encoding}") as inp:
    y = yaml.read(inp)
  y.pretty_print()
  >>>

  output {
    File outfile = read_string(stdout())
  }

  runtime {
    container: "python_with_yaml"
    interpreter: "python"
  }
}

The signature of the above function is: String read_yaml(File, String?)

User-defined functions are similar to tasks, with the following differences:

  • User-defined functions begin with the func keyword.
  • The order of the input parameters matters - the (left-to-right) function signature is the set of input parameters ordered from top-to-bottom.
  • There may be at most one optional input parameter (which may or may not have a default value), and it must be the last parameter in the signature.
  • Only a single output parameter is allowed.
  • Runtime attributes and Hints: TBD - if functions are executed in the same process as the calling task, runtime/hints must somehow be merged with the task's runtime/hints.
  • There is one function-specific runtime attribute:
    • sections: the section(s) in which the function may be used; defaults to "*", may be a String or Array[String] with one of the four task sections that allow expressions (input, output, command, runtime)

Similar to structs, funcs exist in a common namespace (regardless of in which WDL file they are defined); however, funcs cannot be aliased, so there must not be any name collisions between funcs defined in different WDL files in the import tree.

Once defined, a func may be used by its (unqualified) name in any command block.

In conjunction with the proposed addition of the interpreter runtime attribute, users will be able to write functions in a variety of programming languages. This raises the question of how to support functions written in different languages, or a function written in a different language than the command block. There are a few possible solutions:

  • Require that the task container (or host) environment provides all of the interpreters required by all of the functions used in the command block.
  • Use a solution such as docker compose or docker run --link to enable the commands to access executables across containers. This means that each function would need to specify its container, and the runtime would be required to dynamically compose the container of the task and all functions used by that task.
  • Execute functions in their own environments, e.g. subprocesses or separate workers. This makes executing a task similar to executing a workflow.

jdidion avatar Oct 15 '20 20:10 jdidion

@jdidion what do you think of the Discussions forum for this & your other? (I just added mine there for good measure :)

mlin avatar Oct 19 '20 11:10 mlin

Sure - added both proposals there. We'll see where they get more traction.

jdidion avatar Oct 19 '20 12:10 jdidion