oncoanalyser icon indicating copy to clipboard operation
oncoanalyser copied to clipboard

Eliminate bottlenecking of markdups

Open SPPearce opened this issue 1 year ago • 1 comments

Description of feature

The pipeline currently seems to have a bottleneck at the alignment -> markdups step, where all the alignment has to be completed before any markdups processes will begin. The pipeline already uses groupKey to determine how many files should be expected from the splitting process, but this happens after the bwamem2 mapping step.

SPPearce avatar Aug 02 '24 09:08 SPPearce

I haven't been able to replicate the bottleneck as I understand from your description.

For some additional context, each MarkDups task must receive all BAMs for a given sample before starting to process and merge into a single output BAM. So blocking in that sense on a per-sample basis is intended and required. However, there should not be blocking/bottlenecking where all alignments must complete before any MarkDups process begins.

I've run oncoanalyser in stub mode and added an artificial 60 second delay to one sample in the bwa-mem2 process to evaluate flow through the NF channels. As expected, all MarkDups tasks run as soon as each set of sample BAMs become available (see attached timeline and below expandable to replicate).

If you're seeing different behaviour, could you please provide some additional details of your observations and how you're running oncoanalyser?


Attachment: execution_timeline_2024-08-05_12-36-17.html.gz

oncoanalyser bwa-mem2/MarkDups data flow check (click to show)

Get and patch oncoanalyser with an artificial 60 second delay in bwa-mem2 for the 'sa.tumor' sample

git clone https://github.com/nf-core/oncoanalyser
(cd oncoanalyser/ && git checkout 41010dd)

cat <<EOF > alignment-delay.patch
--- a/oncoanalyser/modules/local/bwa-mem2/mem/main.nf
+++ b/oncoanalyser/modules/local/bwa-mem2/mem/main.nf
@@ -64,6 +64,10 @@ process BWAMEM2_ALIGN {

     """
+    if [[ \${meta.sample_id} == 'sa.tumor' ]]; then
+      sleep 60;
+    fi
+
     touch \${output_fn}
     touch \${output_fn}.bai

EOF

patch -lp1 < alignment-delay.patch

Create samplesheet

cat <<EOF > samplesheet.csv
group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
sa_debug,sa,sa.normal,normal,dna,fastq,library_id:sa.normal.lb;lane:1,$(pwd)/temp/sa.normal.R1.fastq.gz;$(pwd)/temp/sa.normal.R2.fastq.gz
sa_debug,sa,sa.tumor,tumor,dna,fastq,library_id:sa.tumor.lb;lane:1,$(pwd)/temp/sa.tumor.R1.fastq.gz;$(pwd)/temp/sa.tumor.R2.fastq.gz

sb_debug,sb,sb.normal,normal,dna,fastq,library_id:sb.normal.lb;lane:1,$(pwd)/temp/sb.normal.R1.fastq.gz;$(pwd)/temp/sb.normal.R2.fastq.gz
sb_debug,sb,sb.tumor,tumor,dna,fastq,library_id:sb.tumor.lb;lane:1,$(pwd)/temp/sb.tumor.R1.fastq.gz;$(pwd)/temp/sb.tumor.R2.fastq.gz
EOF

Create local configuration

cat <<EOF > stub.config
params {
    genomes {
        'GRCh38_hmf' {
            fasta         = "$(pwd)/temp/GRCh38.fasta"
            fai           = "$(pwd)/temp/GRCh38.fai"
            dict          = "$(pwd)/temp/GRCh38.dict"
            bwamem2_index = "$(pwd)/temp/GRCh38_bwa-mem2_index/"
            gridss_index  = "$(pwd)/temp/GRCh38_gridss_index/"
            star_index    = "$(pwd)/temp/GRCh38_star_index/"
        }
    }
  ref_data_virusbreakenddb_path = '$(pwd)/temp/virusbreakenddb_20210401/'
  ref_data_hmf_data_path = '$(pwd)/temp/hmf_bundle_38/'
  ref_data_panel_data_path = '$(pwd)/temp/panel_bundle/tso500_38/'
}
EOF

Run oncoanalyser

nextflow run -config stub.config oncoanalyser/main.nf \
  \
  -stub \
  --create_stub_placeholders \
  \
  --max_cpus 1 \
  --max_memory 1.GB \
  \
  --mode wgts \
  --genome GRCh38_hmf \
  --input samplesheet.csv \
  --outdir output_stub/

scwatts avatar Aug 05 '24 02:08 scwatts

Closing the issue but please re-open if you'd like to discuss further!

scwatts avatar Sep 11 '24 23:09 scwatts

Closing the issue but please re-open if you'd like to discuss further!

Ah, completely forgot about this one, been busy with other bits ATM.

SPPearce avatar Sep 12 '24 06:09 SPPearce