wdl
wdl copied to clipboard
Proposal: user-defined functions
Cross-posted from #405
Also see discussions here:
- https://github.com/openwdl/wdl/discussions/413
- https://github.com/openwdl/wdl/discussions/407
Currently, the WDL specification provides a small library of functions that meet the needs of many use-cases, but certainly not all of them. The ability to define new functions has been requested several times in the past. This proposal aims for an idiomatic specification of UDFs.
Example:
task call_funcs {
input {
File infile
}
command <<<
process_yaml < ~{read_yaml(infile)} > output.txt
>>>
output {
File outfile = "output.txt"
}
}
func read_yaml {
input {
File infile
String? encoding
}
command <<<
import yaml
with open("~{infile}", "r", encoding="~{encoding}") as inp:
y = yaml.read(inp)
y.pretty_print()
>>>
output {
File outfile = read_string(stdout())
}
runtime {
container: "python_with_yaml"
interpreter: "python"
}
}
The signature of the above function is: String read_yaml(File, String?)
User-defined functions are similar to tasks
, with the following differences:
- User-defined functions begin with the
func
keyword. - The order of the
input
parameters matters - the (left-to-right) function signature is the set of input parameters ordered from top-to-bottom. - There may be at most one optional input parameter (which may or may not have a default value), and it must be the last parameter in the signature.
- Only a single output parameter is allowed.
- Runtime attributes and Hints: TBD - if functions are executed in the same process as the calling task, runtime/hints must somehow be merged with the task's runtime/hints.
- There is one function-specific runtime attribute:
- sections: the section(s) in which the function may be used; defaults to "*", may be a String or Array[String] with one of the four task sections that allow expressions (input, output, command, runtime)
Similar to struct
s, func
s exist in a common namespace (regardless of in which WDL file they are defined); however, func
s cannot be aliased, so there must not be any name collisions between func
s defined in different WDL files in the import tree.
Once defined, a func
may be used by its (unqualified) name in any command block.
In conjunction with the proposed addition of the interpreter
runtime attribute, users will be able to write functions in a variety of programming languages. This raises the question of how to support functions written in different languages, or a function written in a different language than the command block. There are a few possible solutions:
- Require that the task container (or host) environment provides all of the interpreters required by all of the functions used in the command block.
- Use a solution such as
docker compose
ordocker run --link
to enable the commands to access executables across containers. This means that each function would need to specify its container, and the runtime would be required to dynamically compose the container of the task and all functions used by that task. - Execute functions in their own environments, e.g. subprocesses or separate workers. This makes executing a task similar to executing a workflow.
@jdidion what do you think of the Discussions forum for this & your other? (I just added mine there for good measure :)
Sure - added both proposals there. We'll see where they get more traction.