cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Cannot coerce expression of type 'String' to 'Directory'

Open gudeqing opened this issue 2 years ago • 4 comments

Hi, can we define Directory in workflow input section? I met error "Cannot coerce expression of type 'String' to 'Directory'". my wdl file, cromwell 59 is used:

version development

workflow pipeline {
    input {
        Directory index_dir = "/home/danny.gu/PycharmProjects/nestcmd/tests/testdata/index/"
    }

    call getFastqInfo{}

    scatter (each in keys(getFastqInfo.fastq_info)) { 
        String sample = each
        File read1 = getFastqInfo.fastq_info[each][0][0]
        File read2 = getFastqInfo.fastq_info[each][1][0]

        call fastp {
            input: 
            read1 = read1,
            read2 = read2
        }

        call salmon {
            input: 
            indexDir = index_dir,
            read1 = fastp.out1,
            read2 = fastp.out2
        }

    }

    call MergeTranscriptTPM {
        input: 
        quants = salmon.outDir
    }

    call MergeTranscriptCount {
        input: 
        quants = salmon.outDir
    }

    meta {
        name: "PipelineExample"
        desc: "This is a simple pipeline for fast gene/transcript quantification. workflow = [fastq -> Fastp -> Salmon]"
        author: "unknown"
        source: "source URL for the tool"
        version: "unknown"
    }

    output{
        Array[File] fastp_out1 = fastp.out1
        Array[File] fastp_out2 = fastp.out2
        Array[File] salmon_transcript = salmon.transcript
        Array[Directory] salmon_outDir = salmon.outDir
        File MergeTranscriptTPM_result = MergeTranscriptTPM.result
        File MergeTranscriptCount_result = MergeTranscriptCount.result
    }

}



task getFastqInfo{
    input {
        Array[Directory]? fastq_dirs
        Array[File]? fastq_files
        String r1_name = '(.*).read1.fastq.gz'
        String r2_name = '(.*).read2.fastq.gz'
        String docker = 'gudeqing/getfastqinfo:1.0'
    }

    command <<<
        set -e
        python /get_fastq_info.py \
            ~{if defined(fastq_dirs) then "-fastq_dirs " else ""}~{sep=" " fastq_dirs} \
            ~{if defined(fastq_files) then "-fastq_files " else ""}~{sep=" " fastq_files} \
            -r1_name '~{r1_name}' \
            -r2_name '~{r2_name}' \
            -out fastq.info.json
    >>>

    output {
        Map[String, Array[Array[File]]] fastq_info = read_json("fastq.info.json")
        File fastq_info_json = "fastq.info.json"
    }

    runtime {
        docker: docker
    }

   
}
    
task fastp{
    input {
        File read1
        File read2
        String out1
        String out2
        # for runtime
        String memory = "1000"
        Int cpu = 2
        String max_memory = "0"
        String max_cpu = "0"
        String docker = "gudeqing/fastp:0.21.0"
    }

    command <<<
        set -e 
        fastp \
        ~{"-i " + read1} \
        ~{"-I " + read2} \
        ~{"-o " + out1} \
        ~{"-O " + out2} 
    >>>

    output {
        File out1 = "~{out1}"
        File out2 = "~{out2}"
    }

    runtime {
        memory: memory
        cpu: cpu
        max_memory: max_memory
        max_cpu: max_cpu
        docker: docker
    }

    meta {
        name: "fastp"
        desc: "This is description of the tool/workflow."
        author: "unknown"
        source: "source URL for the tool"
        version: "unknown"
    }


}

task salmon{
    input {
        String libType = "A"
        Directory indexDir
        File read1
        File read2
        String outDir = "quant"
        Boolean gcBias = true
        # for runtime
        String memory = "2147483648"
        Int cpu = 2
        String max_memory = "0"
        String max_cpu = "0"
        String docker = "combinelab/salmon:latest"
    }

    command <<<
        set -e 
        salmon quant \
        ~{"--libType " + libType} \
        ~{"-i " + indexDir} \
        ~{"-1 " + read1} \
        ~{"-2 " + read2} \
        ~{"-o " + outDir} \
        ~{if gcBias then "--gcBias  " else ""} 
    >>>

    output {
        File transcript = "~{outDir}/quant.sf"
        Directory outDir = "~{outDir}"
    }

    runtime {
        memory: memory
        cpu: cpu
        max_memory: max_memory
        max_cpu: max_cpu
        docker: docker
    }

    meta {
        name: "salmon"
        desc: "transcript expression quantification"
        author: "unknown"
        source: "source URL for the tool"
        version: "unknown"
    }


}

task MergeTranscriptTPM{
    input {
        Array[Directory] quants
        Array[String]? names
        String out = "merged.TPM.txt"
        # for runtime
        String memory = "1000"
        Int cpu = 2
        String max_memory = "0"
        String max_cpu = "0"
        String docker = "combinelab/salmon:latest"
    }

    command <<<
        set -e 
        salmon quantmerge \
        ~{if defined(quants) then "--quants  " else ""}~{sep=" " quants} \
        ~{if defined(names) then "--names  " else ""}~{sep=" " names} \
        --column TPM \
        ~{"--output " + out} 
    >>>

    output {
        File result = "~{out}"
    }

    runtime {
        memory: memory
        cpu: cpu
        max_memory: max_memory
        max_cpu: max_cpu
        docker: docker
    }

    meta {
        name: "MergeTranscriptTPM"
        desc: "Merge multiple quantification results into a single file"
        author: "unknown"
        source: "source URL for the tool"
        version: "unknown"
    }

}

task MergeTranscriptCount{
    input {
        Array[Directory] quants
        Array[String]? names
        String out = "merged.NumReads.txt"
        # for runtime
        String memory = "1000"
        Int cpu = 2
        String max_memory = "0"
        String max_cpu = "0"
        String docker = "combinelab/salmon:latest"
    }

    command <<<
        set -e 
        salmon quantmerge \
        ~{if defined(quants) then "--quants  " else ""}~{sep=" " quants} \
        ~{if defined(names) then "--names  " else ""}~{sep=" " names} \
        --column NumReads \
        ~{"--output " + out} 
    >>>

    output {
        File result = "~{out}"
    }

    runtime {
        memory: memory
        cpu: cpu
        max_memory: max_memory
        max_cpu: max_cpu
        docker: docker
    }

    meta {
        name: "MergeTranscriptCount"
        desc: "Merge multiple quantification results into a single file"
        author: "unknown"
        source: "source URL for the tool"
        version: "unknown"
    }



}

gudeqing avatar Sep 22 '21 14:09 gudeqing

Indeed, the following workflow:

$ echo 'version development

workflow main {
  input {
    Directory d = "/etc"
  }
}' > main.wdl

Will fail the womtool parser:

$ java -jar womtool-67.jar validate main.wdl
Failed to process workflow definition 'main' (reason 1 of 1): Failed to process input declaration 'Directory d = "/etc"' (reason 1 of 1): Cannot coerce expression of type 'String' to 'Directory'

Despite coercion from String to Directory being allowed by the WDL specification and this being among the examples (see here and here).

Surprisingly, you can coerce a String into a Directory if it comes from an input file:

$ echo 'version development

workflow main {
  input {
    Directory d
  }
}' > main.wdl

$ echo '{
  "main.d": "/etc"
}' > main.json

And then:

$ java -jar womtool-67.jar validate main.wdl -i main.json
Success!

Also puzzling is the following:

$ echo 'version development

workflow main {
  input {
    Directory d
  }
  String s = sub(d, "x", "y")
}' > main.wdl

And then:

$ java -jar womtool-67.jar validate main.wdl
Failed to process workflow definition 'main' (reason 1 of 1): Failed to process declaration 'String s = sub(d, "x", "y")' (reason 1 of 1): Failed to process expression 'sub(d, "x", "y")' (reason 1 of 1): Invalid parameter 'IdentifierLookup(d)'. Expected 'File' but got 'Directory'

First of all, it is unclear why womtool claims sub expects a File, as the definition of sub is String sub(String, String, String) so File is not something that should be expected. Here it should be allowed to coerce Directory to String the same way as it is allowed to coerce File to String:

$ echo 'version development

workflow main {
  input {
    File f
  }
  String s = sub(f, "x", "y")
}' > main.wdl

And then:

$ java -jar womtool-67.jar validate main.wdl
Success!

freeseek avatar Sep 22 '21 15:09 freeseek

Thanks, @freeseek , I see. Anyway, my solution is to make no default value for "Directory" input. From my own experiences, you should never expect WDL being perfect, haha.

gudeqing avatar Sep 22 '21 16:09 gudeqing

Hi, I'm wondering what's the status here. We are bit by this and we really want to use Directory type because that saves a lot of troubles.

Without careful check, I'm wondering if it's just failed wom check and the String to Directory conversion actually works in Cromwell? If so, I'm wondering if this code is relevant and could partially fix the problem (not the sub function I think) if WomUnlistedDirectoryType is added to WomStringType coercion targets.

yunhailuo avatar Sep 20 '23 16:09 yunhailuo

Hi, I'm wondering what's the status here. We are bit by this and we really want to use Directory type because that saves a lot of troubles.

Without careful check, I'm wondering if it's just failed wom check and the String to Directory conversion actually works in Cromwell? If so, I'm wondering if this code is relevant and could partially fix the problem (not the sub function I think) if WomUnlistedDirectoryType is added to WomStringType coercion targets.

Sorry for the random @ but may I get some eyes from contributors here? Just to see whether a simple fix on coercionMap is meaningful and helpful? Happy to contribute with PR. Maybe @aednichols ?

yunhailuo avatar Sep 27 '23 16:09 yunhailuo