rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

Execution time does not change even if some QC tools are skipped

Open mimifp opened this issue 3 years ago • 2 comments

Description of the bug

I have been testing the execution of the pipeline on an HPC. Despite performing the execution with 32 threads, the time (for 1 fastq pair-end sample) was going up to 3h 15min approximately. To minimize this time (since we have to process a total of 350 samples) I skipped the execution of some quality control tools (preseq, biotype_qc, dupradar, stringtie and deseq2_qc) and the pseudo-alignment with salmon. Even so, the execution time hardly changed. This raises several questions for me:

  • Any idea what this might be due to?
  • Is it not possible to go below the 3h execution time for this pipeline?
  • What are the most demanding processes/tools?

Thanks in advance!

Command used and terminal output

#### COMMAND USED ####

nextflow run nf-core/rnaseq -r 3.8.1  --input ${SAMPLESHEET} -profile singularity --max_cpus 32 --genome GRCh38 --fasta ${REFERENCE_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --gtf ${REFERENCE_DIR}/gencode.v41.annotation.gtf.gz --gencode --star_index ${STAR_INDEX} --aligner star_salmon --skip_preseq --skip_biotype_qc --skip_dupradar --skip_stringtie --skip_deseq2_qc --outdir $OUTDIR 

#### OUTPUT ####
nf-core/rnaseq v3.8.1
------------------------------------------------------
Core Nextflow options
  revision       : 3.8.1
  runName        : pedantic_brown
  containerEngine: singularity
  launchDir      : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/1_src
  workDir        : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/1_src/work
  projectDir     : /home/usc/mg/mfp/.nextflow/assets/nf-core/rnaseq
  userName       : uscmgmfp
  profile        : singularity
  configFiles    : /home/usc/mg/mfp/.nextflow/assets/nf-core/rnaseq/nextflow.config

Input/output options
  input          : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/1_src/samplesheet_rna.csv
  outdir         : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/3_results/1_URONCOLOGY-RNA-PE
  email          : [email protected]

Reference genome options
  genome         : GRCh38
  fasta          : /mnt/netapp2/translational_oncology/0_reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
  gtf            : /mnt/netapp2/translational_oncology/0_reference/gencode.v41.annotation.gtf.gz
  gene_bed       : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed
  star_index     : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/2_data/1_star_reference
  gencode        : true

Process skipping options
  skip_stringtie : true
  skip_preseq    : true
  skip_dupradar  : true
  skip_biotype_qc: true
  skip_deseq2_qc : true

Max job request options
  max_cpus       : 32
------------------------------------------------------
Completed at: 22-Sep-2022 11:59:44
Duration    : 3h 26m 39s
CPU hours   : 34.4
Succeeded   : 41

Relevant files

No response

System information

  • Nextflow Version 22.04.5
  • Executed in Finisterrae III HPC with slurm and Linux
  • Singularity container
  • Version of nf-core/rnaseq 3.8.1

mimifp avatar Sep 22 '22 13:09 mimifp

Hi @mimifp ! Are you able to send the .nextflow.log files for the runs before and after you used all of the skip steps?

There should also be some files in results/pipeline_info/execution_* that will give you an idea as to how long each process took and an overall summary.

Be interested to know what these look like when compared before and after.

drpatelh avatar Sep 30 '22 10:09 drpatelh

Hi @drpatelh, first of all thanks for answer and sorry for delay! I ran the pipeline again with all and some tools and the times have varied a little but not much:

All tools Completed at: 03-Oct-2022 12:19:04 Duration : 3h 9m 7s CPU hours : 35.3 Succeeded : 47

Some tools Completed at: 30-Sep-2022 17:41:20 Duration : 2h 53m 21s CPU hours : 29.8 Succeeded : 36

Regarding to execution info files, I attach them for both pipelines.

execution_info.zip

Thanks!

mimifp avatar Oct 03 '22 13:10 mimifp

When I run -profile test and skip most of the tools, I reduce the running time by around ~33% on AWS Batch. I think I'm going to close this issue but feel free to reopen it if you have any more questions.

We could add a 'streamlined' profile which only runs the fastest tools for a minimum turnaround time.

adamrtalbot avatar May 31 '23 08:05 adamrtalbot