wdl
wdl copied to clipboard
Undefined behavior when turning coerced optional `File?` to null + Clarification about where String to File coercion takes place
I originally encountered these issues at https://github.com/chanzuckerberg/miniwdl/issues/696.
One thing the WDL spec is vague about is how a task should coerce string to file. The spec says that all non-output declarations must run prior to the command section. My implicit understanding is that the output declaration will be running under a different directory than the rest of the task. It sounds like the output declarations are running in the current directory under the host machine, while the output section is running in the current directory inside the container.
For example:
task test {
input {
File f_input = "test.txt"
}
command <<<printf "hello" > test.txt>>>
File f_body = "test.txt"
output {
File f_output = "test.txt"
}
}
Assuming all files exist, it's implicitly assumed that f_input
and f_body
will point to some file on the host machine, but f_output
will point to the file inside the container. Maybe this should be clarified in the SPEC, as it is not immediately obvious.
Another issue that arose when testing around with miniwdl is that there can be inconsistent behavior with coerced optional files.
Given the WDL workflow:
version 1.1
workflow testWorkflow {
input {
}
call testTask
output {
Array[File?] array_in_output = testTask.array_in_output
Int len_in_output = testTask.len_in_output
Array[File?] array_in_body_out = testTask.array_in_body_out
Int len_in_body_out = testTask.len_in_body_out
Array[File?] array_in_input_out = testTask.array_in_input_out
Int len_in_input_out = testTask.len_in_input_out
}
}
task testTask {
input {
Array[File?] array_in_input = ["example1.txt", "example2.txt"]
Int len_in_input = length(select_all(array_in_input))
}
command <<<>>>
Array[File?] array_in_body = ["example1.txt", "example2.txt"]
Int len_in_body = length(select_all(array_in_body))
output {
Array[File?] array_in_output = ["example1.txt", "example2.txt"]
Int len_in_output = length(select_all(array_in_output))
Array[File?] array_in_body_out = array_in_body
Int len_in_body_out = len_in_body
Array[File?] array_in_input_out = array_in_input
Int len_in_input_out = len_in_input
}
}
The spec says that optional file types at task outputs will be coerced to null.
For one, is there a reason why this scope is limited to just task outputs and not workflow outputs?
Additionally, because the spec says this null coercion is applied at the output step, given that the files example1.txt
and example2.txt
don't exist, the assumed correct output for the WDL workflow above is:
{
"dir": "/home/heaucques/Documents/wdl-conformance-tests/20240626_184902_testWorkflow",
"outputs": {
"testWorkflow.array_in_body_out": [
null,
null
],
"testWorkflow.array_in_input_out": [
null,
null
],
"testWorkflow.array_in_output": [
null,
null
],
"testWorkflow.len_in_body_out": 2,
"testWorkflow.len_in_input_out": 2,
"testWorkflow.len_in_output": 0
}
}
Because the null coercion happens at the task output, the select_all
function calls all will return different values depending on what part of the section it is called in; the body will return ["example1.txt", "example2.txt"]
, giving a length of 2. However, for the task output declaration, the function select_all
will return [null, null]
, giving a length of 0. Since this can be counterintuitive as one may expect that a nonexistent file will always not be counted in a select_all
call, is this the expected behavior, or what should the expected behavior be?