nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Task output files mixed up between different subworkflow instances

Open sralchemab opened this issue 6 months ago • 0 comments

Bug report

Expected behavior and actual behavior

I have a subworkflow that I'm importing two times and invoking like this:

include { ANNOTATE as ANNOTATE_BCR } from '../subworkflows/local/annotate'
include { ANNOTATE as ANNOTATE_TCR } from '../subworkflows/local/annotate'

workflow TRUST4 {

    take:
    ch_samplesheet // channel: samplesheet read in from --input

    main:

    [...]

    ANNOTATE_BCR(
        ch_bcr_sequences,
        DATABASES.out.reference_igblast.collect(),
        DATABASES.out.reference_fasta.collect(),
        'ig'
    )

    ANNOTATE_TCR(
        ch_tcr_sequences,
        DATABASES.out.reference_igblast.collect(),
        DATABASES.out.reference_fasta.collect(),
        'tr'
    )

    [...]
}

The subworkflow looks something like this:

workflow ANNOTATE {

    take:
    ch_sequences        // channel: [ val(meta), path(sequences_file)  ]
    ch_igblastdb        // channel: /path/to/igblastdb')
    ch_reference_fasta  // channel: /path/to/imgtdb')
    loci                // val    : ig or tr

    main:

    CHANGEO_ASSIGNGENES_FMT7(ch_sequences, ch_igblastdb, loci)

    CHANGEO_MAKEDB(ch_sequences, CHANGEO_ASSIGNGENES_FMT7.out.blast, ch_reference_fasta)

    [...]

    emit:
    repertoire = ch_repertoire
}

I have multiple samples that will go simultaneously through both of these subworkflows.

Now, the issue I'm having is happening randomly and it's costing me a lot of money. For some reason, the task CHANGEO_MAKEDB is sometimes receiving the output from a different instance of the CHANGEO_ASSIGNGENES_FMT7 task.

Here's an example of what I mean:

# running subworkflow for multiple samples
    ANNOTATE_BCR(0000398770)
    ANNOTATE_BCR(0002294183)

# within each subworkflow
    CHANGEO_ASSIGNGENES_FMT7(0000398770)
    # CHANGEO_ASSIGNGENES_FMT7.out.blast returns 0000398770_ig_igblast.fmt7
    CHANGEO_MAKEDB(0000398770, CHANGEO_ASSIGNGENES_FMT7.out.blast)

    CHANGEO_ASSIGNGENES_FMT7(0002294183)
    # CHANGEO_ASSIGNGENES_FMT7.out.blast returns 0002294183_ig_igblast.fmt7
    CHANGEO_MAKEDB(0002294183, CHANGEO_ASSIGNGENES_FMT7.out.blast)

# however, CHANGEO_MAKEDB is receiving as input the output of the wrong instance of CHANGEO_ASSIGNGENES_FMT7, e.g.

    CHANGEO_MAKEDB(0000398770, 0002294183_ig_igblast.fmt7)

    CHANGEO_MAKEDB(0002294183, 0000398770_ig_igblast.fmt7)

Here are some extra details:

# modules.conf
process {
    withName: '.*:ANNOTATE_BCR:.*|.*:ANNOTATE_TCR:.*' {
        publishDir = [ enabled: false ]
        ext.prefix = { "${meta.id}_${meta.loci}" }
    }
}
# module: changeo_makedb.nf
process CHANGEO_MAKEDB {
    tag "$meta.id"
    label 'process_low'

    conda "bioconda::changeo=1.3.0 bioconda::igblast=1.22.0 conda-forge::wget=1.20.1"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/mulled-v2-7d8e418eb73acc6a80daea8e111c94cf19a4ecfd:a9ee25632c9b10bbb012da76e6eb539acca8f9cd-1' :
        'biocontainers/mulled-v2-7d8e418eb73acc6a80daea8e111c94cf19a4ecfd:a9ee25632c9b10bbb012da76e6eb539acca8f9cd-1' }"

    input:
    tuple val(meta), path(reads) // reads in fasta format
    path  (igblast)
    path  (reference_fasta)

    output:
    tuple val(meta), path("*db-pass.tsv"), emit: tab        // sequence table in AIRR format
    path  ("*_command_log.txt")          , emit: logs       // process logs
    path  "versions.yml"                 , emit: versions

    script:
    def args = task.ext.args ?: ''
    """
    MakeDb.py igblast \\
        -i $igblast \\
        -s $reads \\
        -r ${reference_fasta}/${params.species.toLowerCase()}/vdj/ \\
        $args \\
        --outname ${task.ext.prefix} \\
        > ${task.ext.prefix}_makedb_command_log.txt

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        igblastn: \$( igblastn -version | grep -o "igblast[0-9\\. ]\\+" | grep -o "[0-9\\. ]\\+" )
        changeo: \$( MakeDb.py --version | awk -F' '  '{print \$2}' )
    END_VERSIONS
    """
}

Here's what the .command.sh file looks like:

MakeDb.py igblast \
    -i 0002294183_ig_igblast.fmt7 \
    -s 0000398770.ig.fasta \
    -r imgtdb_base/human/vdj/ \
    --regions default --format airr --extended \
    --outname 0000398770_ig \
    > 0000398770_ig_makedb_command_log.txt

cat <<-END_VERSIONS > versions.yml
"ALCHEMAB_RECONSTRIMM:RECONSTRIMM:ANNOTATE_BCR:CHANGEO_MAKEDB":
    igblastn: $( igblastn -version | grep -o "igblast[0-9\. ]\+" | grep -o "[0-9\. ]\+" )
    changeo: $( MakeDb.py --version | awk -F' '  '{print $2}' )
END_VERSIONS

As you can see, both $reads and ${task.ext.prefix} are using the same meta.id, while $igblast is different.

I'm not sure what's going on here nor how I could properly troubleshoot it. So any help would be appreciated.

Steps to reproduce the problem

I can't reproduce it on my own. I just re-run the same jobs multiple times, and sometimes they fail and sometimes they succeed. I noticed that when I submit up to 9-10 jobs, they work fine. If I submit more than that, they would start failing. I will try to write a new and smaller pipeline to see if I can share it here for testing purposes.

Program output

I downloaded the output log from the Batch job, but it's not the same as the .nextflow.log and I don't know how to get this file. So I don't think that the log I'm providing would be of much help. I'm also attaching the command.run and command.sh files to show how the files are being mixed up: you can check out the nxf_stage function on the former, and the script parameters on the later.

files.tar.gz Files are named log-events-viewer-result.csv, command.run, and command.sh.

Environment

I'm running the jobs using AWS Batch. I set up the infrastructure using Tower but I'm submitting the jobs manually using the AWS Cli. I'm using Fargate for the main job, Wave, Fusion, Scratch = False, and I enabled cloudcache as well.

Nextflow version 24.04.3

sralchemab avatar Aug 20 '24 14:08 sralchemab