cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Cromwell tries to define undefined variables causing bizarre errors

Open aofarrel opened this issue 1 year ago • 0 comments

The outputs of an optional scattered task that does not run should be undefined. Instead, Cromwell seems to think it is defined, and it has a length. This can cause all sorts of issues, such as breaking downstream tasks that are only supposed to run if the optional upstream task has run, and some very odd error messages.

Simple example:

# task_a and task_b are mutually exclusive scattered tasks
Array[File?] vcfs = select_first([task_a.vcf_out, task_b.vcf_out])

Due to this bug, vcfs will yield an empty array if task_a did not run, even though task_b did run. This gets quite messy if you need to process the output of mutually exclusive tasks later.

More involved example:

# variant_call_after_earlyQC_filtering is an optional task, so variant_call_after_earlyQC_filtering.errorcode is an optional type
if(defined(variant_call_after_earlyQC_filtering.errorcode)) {

  # variant_call_after_earlyQC_filtering is a scattered task, so variant_call_after_earlyQC_filtering.errorcode is an array
  # this length check should be redundant with the defined check earlier, but neither of them seem to work properly
  if(length(variant_call_after_earlyQC_filtering.errorcode) > 0) {
	
        # get the first (0th) value and coerce it into type String
	String coerced_vc_filtered_errorcode = select_first([variant_call_after_earlyQC_filtering.errorcode[0], "FALLBACK"])
	call echo as echo_a {input: integer=length(variant_call_after_earlyQC_filtering.errorcode), string=variant_call_after_earlyQC_filtering.errorcode[0]}
	call echo as echo_b {input: string=coerced_vc_filtered_errorcode}
        call echo_array as echo_c {input: strings=variant_call_after_earlyQC_filtering.errorcode}
  }
}

Output:

  • echo_a will echo "1" for input integer and an empty string for input string
  • echo_b will echo "FALLBACK" for input string
  • echo_c will cause an error
    • "message":"Cannot interpolate Array[String?] into a command string with attribute set [PlaceholderAttributeSet(None,None,None,Some( ))]"
    • This error occurs even if echo_array takes in non-optional Array[String?] or Array[String?]?

An example WDL, which passes womtool and miniwdl check, is available here. It actually shows the issue twice -- once in the section starting with if(defined(variant_call_after_earlyQC_filtering.errorcode)) { and once in the section starting with if(defined(profile_bam.strain)) {

Interestingly, the results of echo_b implies that select_first() is a more accurate way of checking if a variable is defined than the built-in defined().

aofarrel avatar Aug 09 '23 17:08 aofarrel